Optimize DB2 Query Using Explain Tool

DB2 has various tools that will show you how it goes about executing a SQL statement. I like the db2expln tool. It also calculates a relative cost of operation for every SQL statement. You can use that to compare the performance of two alternative statements.

Let’s say that we have a table to store product categories called CATEGORY. A category can have child sub-categories. This relationship is stored in another table called CATEGORYREL. The CATEGORYREL table has these columns:

  • CATEGORY_ID_PARENT – The primary key of the parent category.
  • CATEGORY_ID_CHILD – The primary key of child category.
  • SEQUENCE – The sorting sequence of the child within the parent.

We can query for all child categories of a parent using either of these two SQL statements:

select * from CATEGORY where CATEGORY_ID in 
    (select CATEGORY_ID_CHILD from CATEGORYREL where
    CATEGORY_ID_PARENT=? order by sequence desc);
select * from CATEGORY, CATEGORYREL where 
    CATEGORY_ID_PARENT=? order by sequence desc;

We know from experience that both will lead to the same query path in most modern RDBMS. But, let’s prove it using DB2 explain.

First, create a file called stmt.txt and enter the two SQL statements shown above. Make sure that the statements end with a “;”.

Next, run the db2expln command from DB2 command prompt using this format:

db2expln -database DATABASE_NAME -schema SCHEMA_NAME -terminal
    -stmtfile stmt.txt -package % -user USER_ID PASSWORD -terminator ;

For example:

db2expln -database STOREDB -schema DB2USER -terminal
    -stmtfile stmt.txt -package % -user db2user dbpass -terminator ;

For the first SQL statement, the command will show this:

SQL Statement:

select *
where CATEGORY_ID in (
  order by sequence desc)

Section Code Page = 1208

Estimated Cost = 0.037254   <-- Note: A
Estimated Cardinality = 0.000000

Access Table Name = DB2USER.CATEGORYREL ID = 2,118   <-- Note: B
| Index Scan: Name = DB2USER.I0000294 ID = 2    <-- Note: C
| | Regular Index (Not Clustered)
| | Index Columns:
| | | 1: CATALOG_ID_LINK (Ascending)
| #Columns = 2
| #Key Columns = 0
| | Start Key: Beginning of Index
| | Stop Key: End of Index   <-- Note: D
| Data Prefetch: None
| Index Prefetch: None
| Lock Intents
| | Table: Intent Share
| | Row : Next Key Share
| Sargable Predicate(s)
| | #Predicates = 2
| | Insert Into Sorted Temp Table ID = t1   <-- Note: E
| | | #Columns = 2
| | | #Sort Key Columns = 1
| | | | Key 1: SEQUENCE (Descending)
| | | Piped
Sorted Temp Table Completion ID = t1
Access Temp Table ID = t1
| #Columns = 2
| Relation Scan
| | Prefetch: Eligible
Nested Loop Join
| Access Table Name = DB2USER.CATEGORY ID = 2,115   <-- Note: F
| | #Columns = 10
| | Single Record
| | Relation Scan
| | | Prefetch: Eligible
| | Lock Intents
| | | Table: Intent Share
| | | Row : Next Key Share
| | Sargable Predicate(s)
| | | #Predicates = 1
Return Data to Application
| #Columns = 10

Now, let’s go through each interesting bits of information that I have marked as Note: A, B, C and so on.

A) This shows the estimated cost. Lower is better. You can use this value to compare two alternative query strategies. The value will depend on the amount of data you have. So, it’s best to run the explain command on a production sized database.

B) This shows that DB2 first starts working on the inner SELECT query (the one within the parenthesis).

C) The inner SELECT query has a WHERE clause using  CATEGORY_ID_PARENT. This column is part of an index. Hence, DB2 starts doing an index scan. It is important to look for index scans which shows that our query is taking advantage of indices.

D) The start and stop index shows how much of the index is scanned. In our case, DB2 has done a full index scan. This is not ideal, but we have no way of avoiding this in our particular example.

E) The inner SELECT query has a ORDER BY clause. To perform sorting, DB2 will need to insert all selected index rows into a temporary table and then sort the temporary table.

F) Finally, DB2 does a nested loop join on the CATEGORY table. What does that mean? It simply goes through the sorted temporary table and then for each CATEGORY_ID_CHILD, looks up the corresponding row in the CATEGORY table. Yep, you heard that right. If the inner query produces N rows, then DB2 does N lookups of the outer table (CATEGORY).

Anyway, for the second SQL statement, you will see that DB2 takes the exact same steps and as a result has the same estimated cost.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s