Tuesday, January 30, 2007

SQL Query Performance Tuning Part II

Avoid functions on RHS of the operator

Often developers use functions or method with their
SQL queries. Consider the following example. select *
from Customer
where YEAR(AccountCreatedOn) = 2005
and MONTH(AccountCreatedOn) = 6
Note that even though AccountCreatedOn has an index,
the above query changes the where clause such a way
that this index cannot be used anymore.
Rewriting the query in the following way will
increase the performance tremendously.
Select *
From Customer
Where AccountCreatedOn between '6/1/2005'
and '6/30/2005'

Check the null Separately

If you are checking for null in the query with some
value say in where condition id in (null,1,2,3) then
check the null separately and the values separately
Instead of
Select * from table where id in (null, 1,2,3)
Use
Select * from table where id=null and id in (1,2,3)
When you are checking a list of values as not in the
performance will be low. Try to avoid not in where
clause instead use in clause which will return record
faster.

Predetermine expected growth

As mentioned earlier indexes have a negative impact
on DML queries. One way to minimize this negative
affect is to specify an appropriate value for fill
factor when creating indexes. When an index is
created, the data for indexed columns is stored on the
disk. When new rows of data are inserted into the
table or the values in the indexed columns are
changed, the database may have to reorganize the
storage of the data to make room for the new rows.
This reorganization can take additional toll on DML
queries. However, if you expect new rows on a regular
basis in any table, you can specify the expected
growth for an index. The terminology used for this
expected growth is different in every database. The
following table lists the terms used by different
RDBMS for expected growth.
Oracle: PCTFREE - Percent Free
Specify optimizer hints in SELECT
Although in most cases the query optimizer will pick
the appropriate index for a particular table based on
statistics, sometimes it is better to specify the
index name in your SELECT query. For example, consider
the following SELECT *
FROM customer
WITH ( Index(IdxPhone))
WHERE city = 'New York City'
and phone = '212-555-1212'
Notice the additional "WITH" clause after FROM. This
example is specific to MS SQL Server. Every database
use different syntax for specifying this value and
they are quite different from each other.

Use EXPLAIN

Most databases return the execution plan for any
SELECT statement that is created by the optimizer.
This plan is very useful in fine tuning SQL queries.
The following table lists SQL syntax for different
databases.
Oracle: EXPLAIN PLAN FOR >Your query<

Avoid foreign key constraints

Foreign keys constraints ensure data integrity at the
cost of performance. Therefore, if performance is your
primary goal you can push the data integrity rules to
your application layer. A good example of a database
design that avoids foreign key constraints is the
System tables in most databases. Every major RDBMS has
a set of tables known as system tables. These tables
contain meta data information about user databases.
Although there are relationships among these tables,
there is no foreign key relationship. This is because
the client, in this case the database itself, enforces
these rules.

Monday, January 29, 2007

SQL Query Performance Tuning Part I

Dear Readers,

First I apologies for disappointing you by not
updating the blog regularly. I can blame my tight work
schedule; in this festival season. Recently I have
come across few books and web sites which talks about
SQL Query performance tuning. I found it interesting
and informative; go through it and leave your
comments.

SQL Query Performance Tuning

Even though relational databases and faster hardware
run most SQL queries with a significantly small
response time, there is always room for improvement.
When the data size is higher and concurrent hit on the
server is high the response time of the query really
matters in the performance of the application
irrespective of the technology in which it is built.

Performance tuning of the query has to be done to
reduce the response time. Performance tuning technique
will differ from situation to situation. Hence it is
purely a case to case basic. There is no common
performance tuning method for all situations.

Ex: Indexing is one of the performance tuning method
but higher the number of index will delay the response
time of all DML operations.

Following are few performances tuning techniques

Database statistics

The most important resource to any SQL optimizer is
the statistics collected for different tables within
the catalog. Statistics is the information about
indexes and their distribution with respect to each
other. Optimizer uses this information to decide the
least expensive path that satisfies a query. Outdated
or missing statistics information will cause the
optimizer to take a less optimized path hence
increasing the overall response time.

Following table lists SQL commands for different
database that is used to update statistics. For
further details, refer to the reference manual

Oracle: DBMS_UTILITY package

Optimizers always tend to select the least expensive
path \u2013 one that returns least number of rows in
fastest time. Why do optimizers rely on statistics?
Consider the following query
select *
from customer
where city = 'New York City'
and phone = '212-555-1212'
Notice that the above query contain two fields in the
"WHERE" clause and there are two indexes defined, each
containing one field. One very important notion to
remember is that the optimizer can only use ONE index
per table. Therefore, it has to make a decision as to
which index to use. Since phone number should return
least amount of rows, our query will run much faster
if the optimizer always uses IdxPhone. However, if
statistics are not updated, the optimizer does not
know which index is better and may decide to choose
IdxCity since 'city' field appears first in our WHERE
clause. Once you update statistics the database will
know more about the data distribution and will
correctly choose the better index to run your query.

Create optimized indexes

SQL optimizer heavily depends on indexes defined for
a particular table. Indexes are double-edged sword: no
index will degrade performance of your SELECT
statements and too many indexes will slow down your
DML (INSERT, UPDATE, and DELETE) queries. Therefore,
it is important to have a right balance of index on
tables. Besides the number of indexes, fields that are
involved and their order is also very important.

When creating indexes, estimate the number of unique
values the column(s) will have for a particular field.
For example, the idxCity index in our sample database
is not a good candidate for an index. When you wish to
search for customers in New York City, it can
potentially return thousands of rows, which are then
searched sequentially. Such indexes seldom help in
speeding up SELECT queries and reduce the response
time for DML queries.

Composite index - Indexes containing more than one
field are called composite index. Such indexes should
be created if you expect to run queries that will have
multiple fields in the WHERE clause and all fields
combined will give significantly less rows than the
first field alone.

For example, in our sample database the index
"IdxCustDate" is a composite index. This index is only
useful if the ratio between customers and the number
of their orders is high \u2013 meaning an average
customer places more than 1000 orders. Creating a
composite index on the Orders table when most
customers have placed only a handful of order wastes
not only spaces on the hard disk but have negative
impact on DML queries.

Clustered index - A clustered index determines the
physical order of data in a table - meaning the actual
data is sorted according to the fields in the index.
This is similar to a telephone directory, which
arranges data by last name. There can be only one
clustered index per table. These indexes are
particularly efficient on columns that are often
searched for range of values.
Although most databases support such index, they use
a different terminology. For example Oracle calls it
Index-Organized Table (IOT).