DB2 has various tools that will show you how it goes about executing a SQL statement. I like the db2expln tool. It also calculates a relative cost of operation for every SQL statement. You can use that to compare the performance of two alternative statements.
Let’s say that we have a table to store product categories called CATEGORY. A category can have child sub-categories. This relationship is stored in another table called CATEGORYREL. The CATEGORYREL table has these columns:
- CATEGORY_ID_PARENT – The primary key of the parent category.
- CATEGORY_ID_CHILD – The primary key of child category.
- SEQUENCE – The sorting sequence of the child within the parent.
We can query for all child categories of a parent using either of these two SQL statements:
select * from CATEGORY where CATEGORY_ID in (select CATEGORY_ID_CHILD from CATEGORYREL where CATEGORY_ID_PARENT=? order by sequence desc); select * from CATEGORY, CATEGORYREL where CATEGORY_ID = CATEGORY_ID_CHILD and CATEGORY_ID_PARENT=? order by sequence desc;
We know from experience that both will lead to the same query path in most modern RDBMS. But, let’s prove it using DB2 explain.
First, create a file called stmt.txt and enter the two SQL statements shown above. Make sure that the statements end with a “;”.
Next, run the db2expln command from DB2 command prompt using this format:
db2expln -database DATABASE_NAME -schema SCHEMA_NAME -terminal -stmtfile stmt.txt -package % -user USER_ID PASSWORD -terminator ;
db2expln -database STOREDB -schema DB2USER -terminal -stmtfile stmt.txt -package % -user db2user dbpass -terminator ;
For the first SQL statement, the command will show this:
SQL Statement: select * from CATEGORY where CATEGORY_ID in ( select CATEGORY_ID_CHILD from CATEGORYREL where CATEGORY_ID_PARENT=? order by sequence desc) Section Code Page = 1208 Estimated Cost = 0.037254 <-- Note: A Estimated Cardinality = 0.000000 Access Table Name = DB2USER.CATEGORYREL ID = 2,118 <-- Note: B | Index Scan: Name = DB2USER.I0000294 ID = 2 <-- Note: C | | Regular Index (Not Clustered) | | Index Columns: | | | 1: CATALOG_ID_LINK (Ascending) | #Columns = 2 | #Key Columns = 0 | | Start Key: Beginning of Index | | Stop Key: End of Index <-- Note: D | Data Prefetch: None | Index Prefetch: None | Lock Intents | | Table: Intent Share | | Row : Next Key Share | Sargable Predicate(s) | | #Predicates = 2 | | Insert Into Sorted Temp Table ID = t1 <-- Note: E | | | #Columns = 2 | | | #Sort Key Columns = 1 | | | | Key 1: SEQUENCE (Descending) | | | Piped Sorted Temp Table Completion ID = t1 Access Temp Table ID = t1 | #Columns = 2 | Relation Scan | | Prefetch: Eligible Nested Loop Join | Access Table Name = DB2USER.CATEGORY ID = 2,115 <-- Note: F | | #Columns = 10 | | Single Record | | Relation Scan | | | Prefetch: Eligible | | Lock Intents | | | Table: Intent Share | | | Row : Next Key Share | | Sargable Predicate(s) | | | #Predicates = 1 Return Data to Application | #Columns = 10
Now, let’s go through each interesting bits of information that I have marked as Note: A, B, C and so on.
A) This shows the estimated cost. Lower is better. You can use this value to compare two alternative query strategies. The value will depend on the amount of data you have. So, it’s best to run the explain command on a production sized database.
B) This shows that DB2 first starts working on the inner SELECT query (the one within the parenthesis).
C) The inner SELECT query has a WHERE clause using CATEGORY_ID_PARENT. This column is part of an index. Hence, DB2 starts doing an index scan. It is important to look for index scans which shows that our query is taking advantage of indices.
D) The start and stop index shows how much of the index is scanned. In our case, DB2 has done a full index scan. This is not ideal, but we have no way of avoiding this in our particular example.
E) The inner SELECT query has a ORDER BY clause. To perform sorting, DB2 will need to insert all selected index rows into a temporary table and then sort the temporary table.
F) Finally, DB2 does a nested loop join on the CATEGORY table. What does that mean? It simply goes through the sorted temporary table and then for each CATEGORY_ID_CHILD, looks up the corresponding row in the CATEGORY table. Yep, you heard that right. If the inner query produces N rows, then DB2 does N lookups of the outer table (CATEGORY).
Anyway, for the second SQL statement, you will see that DB2 takes the exact same steps and as a result has the same estimated cost.