SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It is particularly useful for handling structured data where there are relations between different entities/variables of the data.
Use this file to discover all available pages before exploring further.
SQL Anti-Patterns Overview
SQL, despite being a powerful language for database operations, has several common anti-patterns that can lead to performance issues, security vulnerabilities, and maintainability problems. Here are the most important anti-patterns to avoid when writing SQL code.
Using SELECT *
-- Anti-pattern: Using SELECT *SELECT * FROM customersJOIN orders ON customers.customer_id = orders.customer_id;-- Better approach: Specify only needed columnsSELECT customers.customer_id, customers.first_name, customers.last_name, orders.order_id, orders.order_date, orders.total_amountFROM customersJOIN orders ON customers.customer_id = orders.customer_id;
Avoid using SELECT * in production code. Instead, explicitly list the columns you need. This improves performance by reducing I/O and network traffic, prevents issues when table schemas change, and makes your code more maintainable by clearly showing which columns are being used.
Not Using Parameterized Queries
-- Anti-pattern: String concatenation for dynamic SQL-- (Pseudocode mixing application code and SQL)query = "SELECT * FROM users WHERE username = '" + username + "' AND password = '" + password + "'";-- Better approach: Use parameterized queries-- In a prepared statement (syntax varies by language/framework)query = "SELECT * FROM users WHERE username = ? AND password = ?";execute_query(query, [username, password]);-- Or using stored proceduresCREATE PROCEDURE authenticate_user @username VARCHAR(50), @password VARCHAR(100)ASBEGIN SELECT * FROM users WHERE username = @username AND password = @password;END;
Always use parameterized queries or prepared statements instead of concatenating strings to build SQL queries. This prevents SQL injection attacks and improves performance through query plan caching.
Inefficient Joins
-- Anti-pattern: Cartesian product (implicit cross join)SELECT customers.name, orders.order_idFROM customers, ordersWHERE customers.customer_id = orders.customer_id;-- Better approach: Explicit JOIN syntaxSELECT customers.name, orders.order_idFROM customersJOIN orders ON customers.customer_id = orders.customer_id;-- Anti-pattern: Joining unnecessary tablesSELECT customers.name, orders.order_idFROM customersJOIN orders ON customers.customer_id = orders.customer_idJOIN order_items ON orders.order_id = order_items.order_idJOIN products ON order_items.product_id = products.product_id;-- ^ Products table isn't needed if we're not using any columns from it-- Better approach: Only join necessary tablesSELECT customers.name, orders.order_idFROM customersJOIN orders ON customers.customer_id = orders.customer_id;
Use explicit JOIN syntax instead of implicit joins in the WHERE clause. Only join tables that are necessary for your query. Unnecessary joins can dramatically decrease performance, especially with large tables.
Not Using Indexes Properly
-- Anti-pattern: Not indexing columns used in WHERE, JOIN, or ORDER BYSELECT * FROM orders WHERE customer_id = 123;-- If customer_id isn't indexed, this will perform a full table scan-- Better approach: Create appropriate indexesCREATE INDEX idx_orders_customer_id ON orders(customer_id);-- Anti-pattern: Using functions on indexed columnsSELECT * FROM customers WHERE LOWER(email) = 'user@example.com';-- Index on email won't be used because of the LOWER function-- Better approach: Avoid functions on indexed columnsSELECT * FROM customers WHERE email = 'user@example.com';-- Or create a functional index if your DBMS supports itCREATE INDEX idx_customers_email_lower ON customers(LOWER(email));
Create appropriate indexes for columns used in WHERE clauses, JOIN conditions, and ORDER BY statements. Avoid using functions on indexed columns in WHERE conditions, as this can prevent the optimizer from using the index.
Using DISTINCT Unnecessarily
-- Anti-pattern: Using DISTINCT to hide join problemsSELECT DISTINCT customers.customer_id, customers.nameFROM customersJOIN orders ON customers.customer_id = orders.customer_id;-- Better approach: Fix the join to avoid duplicatesSELECT customers.customer_id, customers.nameFROM customersJOIN ( SELECT DISTINCT customer_id FROM orders) o ON customers.customer_id = o.customer_id;-- Or even better, if you just want customers who have orders:SELECT customers.customer_id, customers.nameFROM customersWHERE EXISTS ( SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id);
Avoid using DISTINCT as a quick fix for duplicate rows. It can hide join problems and hurt performance. Instead, fix the underlying issue by using the correct join type or restructuring your query.
Using Cursors Instead of Set-Based Operations
-- Anti-pattern: Using cursors for row-by-row processingDECLARE @customer_id INT;DECLARE @total_spent DECIMAL(10, 2);DECLARE customer_cursor CURSOR FORSELECT customer_id FROM customers;OPEN customer_cursor;FETCH NEXT FROM customer_cursor INTO @customer_id;WHILE @@FETCH_STATUS = 0BEGIN SELECT @total_spent = SUM(total_amount) FROM orders WHERE customer_id = @customer_id; UPDATE customers SET total_spent = @total_spent WHERE customer_id = @customer_id; FETCH NEXT FROM customer_cursor INTO @customer_id;ENDCLOSE customer_cursor;DEALLOCATE customer_cursor;-- Better approach: Use set-based operationsUPDATE customersSET total_spent = ( SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id);
Avoid using cursors for row-by-row processing when set-based operations can accomplish the same task. Set-based operations are typically much faster and more efficient in SQL.
Not Using Appropriate Data Types
-- Anti-pattern: Using inappropriate data typesCREATE TABLE products ( product_id VARCHAR(50), -- Using VARCHAR for a numeric ID price VARCHAR(10), -- Using VARCHAR for a numeric price created_at VARCHAR(20) -- Using VARCHAR for a date/time);-- Better approach: Use appropriate data typesCREATE TABLE products ( product_id INT PRIMARY KEY, price DECIMAL(10, 2), created_at TIMESTAMP);
Use appropriate data types for your columns. Using the wrong data type can lead to unexpected behavior, poor performance, and increased storage requirements. For example, use numeric types for numbers, date/time types for dates, and VARCHAR only for variable-length strings.
Storing Comma-Separated Values
-- Anti-pattern: Storing comma-separated valuesCREATE TABLE users ( user_id INT PRIMARY KEY, name VARCHAR(100), roles VARCHAR(255) -- Stores roles like 'admin,editor,viewer');-- Querying for users with a specific role is difficultSELECT * FROM users WHERE roles LIKE '%admin%';-- This could match 'admin', but also 'administrator' or 'badmin'-- Better approach: Normalize the dataCREATE TABLE users ( user_id INT PRIMARY KEY, name VARCHAR(100));CREATE TABLE roles ( role_id INT PRIMARY KEY, role_name VARCHAR(50));CREATE TABLE user_roles ( user_id INT, role_id INT, PRIMARY KEY (user_id, role_id), FOREIGN KEY (user_id) REFERENCES users(user_id), FOREIGN KEY (role_id) REFERENCES roles(role_id));-- Now querying for users with a specific role is straightforwardSELECT u.user_id, u.nameFROM users uJOIN user_roles ur ON u.user_id = ur.user_idJOIN roles r ON ur.role_id = r.role_idWHERE r.role_name = 'admin';
Avoid storing comma-separated values in a single column. This violates first normal form and makes it difficult to query, update, and maintain data integrity. Instead, normalize your data by creating appropriate related tables.
Not Using Transactions
-- Anti-pattern: Not using transactions for multi-statement operationsUPDATE accounts SET balance = balance - 100 WHERE account_id = 123;UPDATE accounts SET balance = balance + 100 WHERE account_id = 456;-- If the second statement fails, the first one still goes through-- Better approach: Use transactionsBEGIN TRANSACTION;UPDATE accounts SET balance = balance - 100 WHERE account_id = 123;UPDATE accounts SET balance = balance + 100 WHERE account_id = 456;-- Check if both updates succeededIF @@ROWCOUNT > 0 COMMIT TRANSACTION;ELSE ROLLBACK TRANSACTION;
Use transactions for operations that require multiple statements to be executed as a single unit. This ensures data consistency and integrity, especially for operations like transfers or complex updates.
Using NOT IN with NULL Values
-- Anti-pattern: Using NOT IN with a subquery that might return NULLSELECT * FROM customersWHERE customer_id NOT IN (SELECT customer_id FROM orders WHERE total_amount > 1000);-- Better approach: Use NOT EXISTS or LEFT JOIN / IS NULL-- Option 1: NOT EXISTSSELECT * FROM customers cWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.total_amount > 1000);-- Option 2: LEFT JOIN / IS NULLSELECT c.*FROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_id AND o.total_amount > 1000WHERE o.order_id IS NULL;
Avoid using NOT IN with a subquery that might return NULL values. In SQL, comparing anything to NULL using NOT IN returns UNKNOWN, which is treated as FALSE in a WHERE clause. Use NOT EXISTS or LEFT JOIN / IS NULL instead.
Using COUNT(*) to Check Existence
-- Anti-pattern: Using COUNT(*) to check existenceIF (SELECT COUNT(*) FROM orders WHERE customer_id = 123) > 0 -- Customer has ordersELSE -- Customer has no orders-- Better approach: Use EXISTSIF EXISTS (SELECT 1 FROM orders WHERE customer_id = 123) -- Customer has ordersELSE -- Customer has no orders
Use EXISTS instead of COUNT() to check if records exist. EXISTS is more efficient because it stops scanning as soon as it finds a matching record, while COUNT() needs to scan all matching records.
Using OFFSET for Pagination
-- Anti-pattern: Using OFFSET for pagination with large offsetsSELECT * FROM productsORDER BY product_idLIMIT 10 OFFSET 10000;-- Better approach: Use keyset pagination-- Assuming the last product_id from the previous page was 10000SELECT * FROM productsWHERE product_id > 10000ORDER BY product_idLIMIT 10;
Avoid using OFFSET for pagination with large offset values. OFFSET requires the database to scan and discard all rows up to the offset, which becomes increasingly inefficient as the offset grows. Use keyset pagination (also known as cursor-based pagination) instead, which uses a filter on the last seen value.
Using OR Conditions on Different Columns
-- Anti-pattern: Using OR conditions on different columnsSELECT * FROM productsWHERE category = 'Electronics' OR brand = 'Samsung';-- Better approach: Use UNIONSELECT * FROM products WHERE category = 'Electronics'UNIONSELECT * FROM products WHERE brand = 'Samsung';
Consider using UNION instead of OR conditions on different columns, especially if you have indexes on those columns. The optimizer can use different indexes for each part of the UNION, potentially improving performance.
Not Using Common Table Expressions (CTEs)
-- Anti-pattern: Repeating complex subqueriesSELECT customer_id, ( SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.customer_id ) AS order_count, ( SELECT SUM(total_amount) FROM orders WHERE orders.customer_id = customers.customer_id ) AS total_spentFROM customers;-- Better approach: Use Common Table Expressions (CTEs)WITH customer_orders AS ( SELECT customer_id, COUNT(*) AS order_count, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id)SELECT c.customer_id, co.order_count, co.total_spentFROM customers cLEFT JOIN customer_orders co ON c.customer_id = co.customer_id;
Use Common Table Expressions (CTEs) to simplify complex queries, improve readability, and avoid repeating subqueries. CTEs can also be recursive, which is useful for hierarchical data.
Using Implicit Conversions
-- Anti-pattern: Implicit conversionsSELECT * FROM orders WHERE order_id = '1000';-- order_id is an INT, but we're comparing it to a string-- Better approach: Use explicit conversions or correct typesSELECT * FROM orders WHERE order_id = 1000;-- Or if you need to convertSELECT * FROM orders WHERE order_id = CAST('1000' AS INT);
Avoid implicit conversions between different data types. They can prevent the optimizer from using indexes and lead to unexpected results. Use the correct data type in your queries or explicit conversions when necessary.
Using Scalar User-Defined Functions in WHERE Clauses
-- Anti-pattern: Using scalar UDFs in WHERE clausesCREATE FUNCTION dbo.GetFullName(@FirstName VARCHAR(50), @LastName VARCHAR(50))RETURNS VARCHAR(101)ASBEGIN RETURN @FirstName + ' ' + @LastName;END;SELECT * FROM employeesWHERE dbo.GetFullName(first_name, last_name) = 'John Doe';-- Better approach: Inline the function logicSELECT * FROM employeesWHERE first_name + ' ' + last_name = 'John Doe';
Avoid using scalar user-defined functions in WHERE clauses. They can prevent the optimizer from using indexes and are executed once per row, which can severely impact performance. Inline the function logic directly in your query when possible.
Not Using Schema Names
-- Anti-pattern: Not using schema namesSELECT * FROM customers;-- Better approach: Always use schema namesSELECT * FROM dbo.customers;
Always use schema names when referencing database objects. This improves query plan caching, prevents ambiguity if multiple schemas have objects with the same name, and makes your code more maintainable.
Using Table Variables When Temporary Tables Would Be Better
-- Anti-pattern: Using table variables for large result setsDECLARE @LargeResults TABLE ( ID INT, Name VARCHAR(100), Value DECIMAL(10, 2));INSERT INTO @LargeResultsSELECT ID, Name, Value FROM SomeLargeTable;-- Better approach: Use temporary tables for large result setsCREATE TABLE #LargeResults ( ID INT, Name VARCHAR(100), Value DECIMAL(10, 2));CREATE INDEX IX_LargeResults_ID ON #LargeResults(ID);INSERT INTO #LargeResultsSELECT ID, Name, Value FROM SomeLargeTable;
Use temporary tables (#temp) instead of table variables (@table) for large result sets or when you need indexes. Table variables store statistics only when they’re created and don’t support indexes (in most database systems), which can lead to poor query plans for large datasets.
Using DELETE Without WHERE
-- Anti-pattern: Using DELETE without WHEREDELETE FROM orders;-- Better approach: Always include a WHERE clauseDELETE FROM orders WHERE order_date < '2020-01-01';-- If you really want to delete all rows, be explicitTRUNCATE TABLE orders; -- Faster than DELETE without WHERE
Always include a WHERE clause with DELETE statements unless you explicitly want to delete all rows. If you do want to delete all rows, use TRUNCATE TABLE, which is faster and uses fewer system resources.
Not Using MERGE for Upserts
-- Anti-pattern: Using separate INSERT/UPDATE for upsertsIF EXISTS (SELECT 1 FROM customers WHERE customer_id = 123) UPDATE customers SET name = 'John Doe', email = 'john@example.com' WHERE customer_id = 123;ELSE INSERT INTO customers (customer_id, name, email) VALUES (123, 'John Doe', 'john@example.com');-- Better approach: Use MERGE for upsertsMERGE INTO customers AS targetUSING (SELECT 123 AS customer_id, 'John Doe' AS name, 'john@example.com' AS email) AS sourceON target.customer_id = source.customer_idWHEN MATCHED THEN UPDATE SET name = source.name, email = source.emailWHEN NOT MATCHED THEN INSERT (customer_id, name, email) VALUES (source.customer_id, source.name, source.email);
Use the MERGE statement for upsert operations (insert if not exists, update if exists) instead of separate INSERT and UPDATE statements. MERGE is more efficient and atomic, reducing the risk of race conditions.
Using Non-Standard SQL
-- Anti-pattern: Using database-specific features without necessity-- SQL Server specific:SELECT TOP 10 * FROM products ORDER BY price DESC;-- Better approach: Use standard SQL when possible-- Standard SQL (works in most databases):SELECT * FROM products ORDER BY price DESC LIMIT 10;
Use standard SQL features when possible to make your code more portable across different database systems. If you do need to use database-specific features, document them clearly and consider abstracting them in a way that makes them easier to change if needed.
Not Using Appropriate Constraints
-- Anti-pattern: Not using appropriate constraintsCREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE, total_amount DECIMAL(10, 2));-- Better approach: Use appropriate constraintsCREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT NOT NULL, order_date DATE NOT NULL, total_amount DECIMAL(10, 2) NOT NULL CHECK (total_amount >= 0), FOREIGN KEY (customer_id) REFERENCES customers(customer_id));
Use appropriate constraints (PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK) to enforce data integrity at the database level. This prevents invalid data from being inserted and makes your database more reliable.