World of Kamsy: SQL Performance Optimization Tips

I was working with this topic during last few days and found some intersting topics. So I hope all of you will benifitited from these stuff.

SQL Performance Optimization Tips

1. Restrict the queries result set by using the WHERE clause as much as possible.

By returning data you don't need, you are causing SQL Server to perform I/O it doesn't need to perform, wasting SQL Server resources. In addition, it increases network traffic, which can also lead to reduced performance. And if the table is very large, a table scan will lock the table during the time-consuming scan, preventing other users from accessing it, hurting concurrency.

2. Try to restrict the queries result set by returning only the particular columns from the table, not all table's columns. Avoid using SELECT *

This can results in good performance benefits, because SQL Server will return to client only particular columns, not all table's columns. This can reduce network traffic and boost the overall performance of the query.

INSERT INTO dbo.my_table

SELECT * FROM dbo.another_table

Don’t use select * , ever in this type of queries.

INSERT INTO dbo.my_table(Numkey,Description,MoreStuff,Otherstuff)

SELECT atID,description,town,county FROM dbo.another_table

Using a column list, additional columns or change of column order will not cause a break.

3. Try to avoid using Server side cursors, whenever possible.

If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task.

Here are some alternatives to using a cursor:

Ø Use WHILE LOOPS

Ø Use temp tables

Ø Use derived tables

Ø Use correlated sub-queries

Ø Use the CASE statement

Ø Perform multiple queries

More often than not, there are non-cursor techniques that can be used to perform the same tasks as a SQL Server cursor.

If you have no choice but to use a server-side cursor in your application, try to use a FORWARD-ONLY or FAST-FORWARD, READ-ONLY cursor. When working with unidirectional, read-only data, use the FAST_FORWARD option instead of the FORWARD_ONLY option, as it has some internal performance optimizations to speed performance. This type of cursor produces the least amount of overhead on SQL Server.

If you are unable to use a fast-forward cursor, then try the following cursors, in this order, until you find one that meets your needs. They are listed in the order of their performance characteristics, from fastest to slowest: dynamic, static, and keyset.

If you need to perform a JOIN as part of your cursor, keyset and static cursors are generally faster than dynamic cursors, and should be used when possible.

When you are done using a cursor, don't just CLOSE it, you must also DEALLOCATE it. De-allocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don't DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released.

4. To return the total table's row count, use alternative way instead of SELECT COUNT(*) statement. Can use “sysindexes” system table (View).

Because SELECT COUNT(*) statement make a full table scan to return the total table's row count, it can take very many time for the large table. There is another way to determine the total row count in a table. You can use “sysindexes” system table, in this case. There is ROWS column in the sysindexes table. This column contains the total row count for each table in your database. So, you can use the following select statement instead of SELECT COUNT(*):

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2;

So, you can improve the speed of such queries in several times.

NOTE : -

There is one potential downside to using the sysindexes table. And that this system table is not updated in real time, so it might underestimate the number of rows you actually have. Assuming you have the database option turned on to "Auto Create Statistics" and "Auto Update Statistics", the value you get should be very close to being correct, if not correct. If you can live with a very close estimate, then this is the best way to count rows in your tables.

5. Try to use constraints instead of triggers, whenever possible.

Constraints are much more efficient than triggers and can boost performance. So, you should use constraints instead of triggers, whenever possible.

However, referential integrity constraints cannot:

· Cascade changes through related tables in the database

· Enforce complex restrictions by referencing other columns or database objects

· Perform "what if" analyses

Also, referential integrity constraints do not roll back the current transaction as a result of enforcing data integrity. With triggers, you can either roll back or continue the transaction, depending on how you handle referential integrity.

Constraints, including check, referential integrity and cascading referential integrity, perform better than triggers. Cascading referential integrity constraints are automatic updates and deletes on dependant objects.

6. Use table variables instead of temporary tables.

Table variables, such as local variables, have a well defined scope at the end of which they are automatically cleared.

Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.

Ø Table variables are completely isolated to the batch that creates them so no 're-resolution' has to occur when a CREATE or ALTER statement takes place, which may occur with a temporary table. Temporary tables need this 're-resolution' so the table can be referenced from a nested stored procedure. Table variables avoid this completely so stored procedures can use plan that is already compiled, thus saving resources to process the stored procedure.

Transactions that involve table variables last only for the duration of an update on the table variable. Therefore, table variables require less locking and logging resources. Because table variables have limited scope and are not part of the persistent database, transaction rollbacks do not affect them.

In general, you use table variables whenever possible except when there is a significant volume of data and there is repeated use of the table. In that case, you can create indexes on the temporary table to increase query performance.

*Note - The table variables are available in SQL Server 2000 only.

7. Try to avoid the HAVING clause, whenever possible.

The purpose of using HAVING clause is to filter the results returned by a GROUP BY clause. Since we can not use WHERE clause to restrict values of aggregate functions we must use HAVING clause.

When you use GROUP BY with the HAVING clause, the GROUP BY clause divides the rows into sets of grouped rows and aggregates their values, and then the HAVING clause eliminates undesired aggregated groups. In many cases, you can write your select statement so, that it will contain only WHERE and GROUP BY clauses without HAVING clause. This can improve the performance of your query.

This is because of the HAVING clause is performed after the aggregate functions and grouping take place. But when we use WHERE clause it restricts the results first and then performs the GROUP BY and aggregation so it is always improve the performance of the query.

SELECT city,sum(sales) FROM dbo.salesmaster

GROUP BY city

HAVING sum(sales) > 2000;

This query is fine where as you have no any option other than using HAVING clause.

SELECT province, city, sum(sales) FROM dbo.salesmaster

GROUP BY city

HAVING province = ‘Western’;

This query can be optimized by using WHERE clause

SELECT province, city, sum(sales) FROM dbo.salesmaster

WHERE province = ‘Western’

GROUP BY city ;

8. Try to avoid using the unnecessary DISTINCT clause, whenever possible.

DISTINCT is separated into 2 processes by SQL - one to reduce the rows and one to calculate the returned results.

Using the DISTINCT keyword means that redundant data is being pulled from the database and then discarded. Usually it is better to rewrite the query's FROM and WHERE clauses to use a subquery to filter the data correctly so that you only get back what you want.

9. Include SET NOCOUNT ON statement into your stored procedures to stop the message indicating the number of rows affected by a T-SQL statement.

This can reduce network traffic, because your client will not receive the message indicating the number of rows affected by a T-SQL statement.

So it is recommended as a practice to use “SET NOCOUNT ON” in every SP you write unless you require the no of rows effected to return.

10. Use the select statements with TOP keyword or the SET ROWCOUNT statement, if you need to return only the first n rows.

This can improve performance of your queries, because the smaller result set will be returned. This can also reduce the traffic between the server and the clients.

The efficiency of Top 1 is usually the same as max() – but may depend upon the indexing, if a sort has to occur a worktable may be created – check with the query plan.

In the quest for unbreakable code and clarity if it is the max(imum) or min(imum) value that is required then use these functions

Select TOP 1 may return (logically) either the maximum or minimum value for a column, or some other non deterministic value.

Never use a TOP command without an order by statement

11. When deleting all the rows from a table use TRUNCATE instead of DELETE.

SQL's DELETE clause is more flexible than TRUNCATE TABLE because it allows you to specify which records to remove. In addition to that TRUNCATE TABLE does not log individual row deletes.

TRUNCATE TABLE is functionally identical to DELETE statement with no WHERE clause: both remove all rows in the table. But TRUNCATE TABLE is faster and uses fewer system and transaction log resources than DELETE.

The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log.

TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes and so on remain. The counter used by an identity for new rows is reset to the seed for the column. If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.

You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint; instead, use DELETE statement without a WHERE clause. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.

You need to be db_owner, ddl_admin or owner of the table to be able to fire a TRUNCATE statement.

12. Index columns which are used in WHERE clauses.

Searching a value without an index will require full table scan (searching on each and every page files). This will produce a huge performance drop for larger tables. For optimizing such a query we have to avoid table scans by using indexes very intelligently.

Just like the reader searching for a word in a book, an index helps when you are looking for a specific record or set of records with a WHERE clause. This includes queries looking for a range of values, queries designed to match a specific value, and queries performing a join on two tables

Since index entries are stored in sorted order, indexes also help when processing ORDER BY clauses. Without an index the database has to load the records and sort them during execution.

Indexes are also useful for maintaining unique values in a column, since the database can easily search the index to see if an incoming value already exists. Primary keys are always indexed for this reason.

When you choose fields for create indexes consider the following points;

Ø Always try to choose smaller entities. Since larger index keys will cause the database to perform more disk reads, which limits throughput. Also index entries are often involved in comparisons, smaller entries are easier to compare.

Ø The most effective indexes are the indexes with a small percentage of duplicated values.

Indexes can be a huge lead to a very big performance drawback if used in careless manner. Here are some Index drawbacks;

Ø Indexes are a performance drag when the time comes to modify records. Any time a query modifies the data in a table the indexes on the data must change also.

Ø Indexes also use disk space. The exact size will depends on the number of records in the table as well as the number and size of the columns in the index. Generally this is not a major concern as disk space is easy to trade for better performance.

13. Minimize the use of not equal operations, <>, != or NOT. Try rephrasing the expression using ranges.

When we use not equal operators SQL Server has no any option other than run a table or index scan for responding to the query. Always we must try to avoid table or index scans to improve the performance. So it is a better practice to use ranges instead of <> or NOT.

Of course, you may not have any choice of which operator you use in your WHERE clauses, but sometimes you do.

Here are the key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom.

=
>, >=, <, <=
LIKE
<>

The lesson here is to use = as much as possible, and <> as least as possible.

14. DONOT start the name of a stored procedure with SP_.

This special prefix is reserved for system stored procedures. Although using this prefix will not prevent a user defined stored procedure from working (unless if you use a existing name in system sp’s), what it can do is to slow down its execution ever so slightly.

The reason for this is that by default, any stored procedure executed by SQL Server that begins with the prefix "sp_", is first attempted to be resolved in the Master database. Since it is not there, time is wasted looking for the stored procedure.

If SQL Server cannot find the stored procedure in the Master database, then it next tries to resolve the stored procedure name as if the owner of the object is "dbo". Assuming the stored procedure is in the current database, it will then execute. To avoid this unnecessary delay, don't name any of your stored procedures with the prefix "sp_".

15. To assign values to variable use SET unless the value is being retrieved from a table.

Assigning a constant value to a local variable

SET @myvariable = 10

Assigning a local variable the value of another variable

SET @myvariable = @someothervariable

Concatenating string variables

SET @mystring = @mystring + @anotherstring + ‘ this is the end’

Assigning values from datasets

SELECT @myvariable = column from dbo.mytable where xxx = yyy

Handling TOP

SELECT TOP 1 @myvariable = column from dbo.mytable

Handling max ( min, count, sum etc. )

SELECT @myvariable = max(column) from dbo.mytable

16. Try to avoid using sub queries within SELECTS (Correlated Subquery).

A correlated subquery is a SELECT statement nested inside another T-SQL statement, which contains a reference to one or more columns in the outer query. Therefore, the correlated subquery can be said to be dependent on the outer query.

SELECT CustName, Address, City, Country, PostalCode

FROM dbo.Customers

WHERE CustomerID IN (

SELECT CustomerID

FROM dbo.InvoiceMaster WHERE

dbo.InvoiceMaster.CustomerID = dbo.Customer.CustomerID

) ;

As you can see in the above query the inner query uses a value from outer query. This is called as Correlated Subquery.

A correlated subquery takes part of it's select from the outer query. In this way it is depenedent upon the outer query. Most times a correlated query has to execute once for every row in the outer query. So this can lead to a performance drop.

17. Always owner qualify your objects

Eg : - dbo.Customer

Look for objects that are not called with qualified owner names. Each time the query is run without qualified owner names on the objects referenced within the query, the Optimizer has to hold compile locks on system objects until it can determine if the objects are the same as the ones in the cached plan. Qualifying the owner name will solve this problem and help with performance and blocking problems.

So at last here are some links where you can find more .

http://www.sql-server-performance.com/vk_sql_best_practices.asp

http://www.sql-server-performance.com/gv_sp_views_selects.asp

http://www.sql-server-performance.com/misc_tips.asp

http://www.sqlservercentral.com/columnists/gvijayakumar/executionplans_printversion.asp

http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1170220_tax301334,00.html?adg=301324&bucket=ETA#row