SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It is particularly useful for handling structured data where there are relations between different entities/variables of the data.
SQL Anti-Patterns Overview
SQL, despite being a powerful language for database operations, has several common anti-patterns that can lead to performance issues, security vulnerabilities, and maintainability problems. Here are the most important anti-patterns to avoid when writing SQL code.
Using SELECT *
Avoid using SELECT *
in production code. Instead, explicitly list the columns you need. This improves performance by reducing I/O and network traffic, prevents issues when table schemas change, and makes your code more maintainable by clearly showing which columns are being used.
Not Using Parameterized Queries
Always use parameterized queries or prepared statements instead of concatenating strings to build SQL queries. This prevents SQL injection attacks and improves performance through query plan caching.
Inefficient Joins
Use explicit JOIN syntax instead of implicit joins in the WHERE clause. Only join tables that are necessary for your query. Unnecessary joins can dramatically decrease performance, especially with large tables.
Not Using Indexes Properly
Create appropriate indexes for columns used in WHERE clauses, JOIN conditions, and ORDER BY statements. Avoid using functions on indexed columns in WHERE conditions, as this can prevent the optimizer from using the index.
Using DISTINCT Unnecessarily
Avoid using DISTINCT as a quick fix for duplicate rows. It can hide join problems and hurt performance. Instead, fix the underlying issue by using the correct join type or restructuring your query.
Using Cursors Instead of Set-Based Operations
Avoid using cursors for row-by-row processing when set-based operations can accomplish the same task. Set-based operations are typically much faster and more efficient in SQL.
Not Using Appropriate Data Types
Use appropriate data types for your columns. Using the wrong data type can lead to unexpected behavior, poor performance, and increased storage requirements. For example, use numeric types for numbers, date/time types for dates, and VARCHAR only for variable-length strings.
Storing Comma-Separated Values
Avoid storing comma-separated values in a single column. This violates first normal form and makes it difficult to query, update, and maintain data integrity. Instead, normalize your data by creating appropriate related tables.
Not Using Transactions
Use transactions for operations that require multiple statements to be executed as a single unit. This ensures data consistency and integrity, especially for operations like transfers or complex updates.
Using NOT IN with NULL Values
Avoid using NOT IN with a subquery that might return NULL values. In SQL, comparing anything to NULL using NOT IN returns UNKNOWN, which is treated as FALSE in a WHERE clause. Use NOT EXISTS or LEFT JOIN / IS NULL instead.
Using COUNT(*) to Check Existence
Use EXISTS instead of COUNT() to check if records exist. EXISTS is more efficient because it stops scanning as soon as it finds a matching record, while COUNT() needs to scan all matching records.
Using OFFSET for Pagination
Avoid using OFFSET for pagination with large offset values. OFFSET requires the database to scan and discard all rows up to the offset, which becomes increasingly inefficient as the offset grows. Use keyset pagination (also known as cursor-based pagination) instead, which uses a filter on the last seen value.
Using OR Conditions on Different Columns
Consider using UNION instead of OR conditions on different columns, especially if you have indexes on those columns. The optimizer can use different indexes for each part of the UNION, potentially improving performance.
Not Using Common Table Expressions (CTEs)
Use Common Table Expressions (CTEs) to simplify complex queries, improve readability, and avoid repeating subqueries. CTEs can also be recursive, which is useful for hierarchical data.
Using Implicit Conversions
Avoid implicit conversions between different data types. They can prevent the optimizer from using indexes and lead to unexpected results. Use the correct data type in your queries or explicit conversions when necessary.
Using Scalar User-Defined Functions in WHERE Clauses
Avoid using scalar user-defined functions in WHERE clauses. They can prevent the optimizer from using indexes and are executed once per row, which can severely impact performance. Inline the function logic directly in your query when possible.
Not Using Schema Names
Always use schema names when referencing database objects. This improves query plan caching, prevents ambiguity if multiple schemas have objects with the same name, and makes your code more maintainable.
Using Table Variables When Temporary Tables Would Be Better
Use temporary tables (#temp) instead of table variables (@table) for large result sets or when you need indexes. Table variables store statistics only when they’re created and don’t support indexes (in most database systems), which can lead to poor query plans for large datasets.
Using DELETE Without WHERE
Always include a WHERE clause with DELETE statements unless you explicitly want to delete all rows. If you do want to delete all rows, use TRUNCATE TABLE, which is faster and uses fewer system resources.
Not Using MERGE for Upserts
Use the MERGE statement for upsert operations (insert if not exists, update if exists) instead of separate INSERT and UPDATE statements. MERGE is more efficient and atomic, reducing the risk of race conditions.
Using Non-Standard SQL
Use standard SQL features when possible to make your code more portable across different database systems. If you do need to use database-specific features, document them clearly and consider abstracting them in a way that makes them easier to change if needed.
Not Using Appropriate Constraints
Use appropriate constraints (PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK) to enforce data integrity at the database level. This prevents invalid data from being inserted and makes your database more reliable.