Wednesday, November 7, 2007

Oracle : Reducing join execution time

After executing a join over two tables, whith 4 million rows each and indexes in join columns, I realized that 4 minutes it's too much time waiting for results. So I decided to watch for the execution plan :

select columns from tableA join TableB on tableA.id = tableB.id

Execution Plan----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 FILTER
2 1 SORT (GROUP BY)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'TABLEA'
4 3 NESTED LOOPS
5 4 TABLE ACCESS (FULL) OF 'TABLEB'
6 4 INDEX (RANGE SCAN) OF 'TABLEA_U1' (UNIQUE)

All seems to be correct, after all I'm using Rule Base Optimizer. In order to reduce execution time I force the execution of an hash join instead of a Nested Loops.
Execution time droped to about 2 minutes and execution plan looks like this :

Execution Plan----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE (Cost=624 Card=40000 Bytes=2 800000)
1 0 FILTER

2 1 SORT (GROUP BY) (Cost=624 Card=40000 Bytes=2800000)
3 2 HASH JOIN (Cost=172 Card=40000 Bytes=2800000)
4 3 TABLE ACCESS (FULL) OF 'TABLEA' (Cost=35 Card=2000 Bytes=62000)
5 3 TABLE ACCESS (FULL) OF 'TABLEB' (Cost=136 Card=2000 Bytes=78000)

And all I've to do is change my query to :

select /*+ USE_HASH (tableA tableB ) */ columns from tableA join TableB on tableA.id = tableB.id

Why is this happening? why RBO this choose this plan? The answer is simple, RBO doesn't consider hash joins has a valid execution path, so you have to force it.
This specific join is used when a large amount of data needs to be joined or when a large fraction of the the table needs to be joined, however this is a very memory expensive operation and need to be carefully analised.

Last but not least, if you want that hash joins were a valid option just start using Cost Base Optimizer. If you don't decide yet what join you should use, just read this article.

No comments: