Scanning an Index

The internet is full of information about indexes, and for a reason. Indexes in a database is probably the most important performance related topic. There are so many cases, properties, and different ways to use indexes that there is simply a lot to write about. In this post I’d like to talk about a specific use case that I’ve seen a few times, and is related to index scans and performance.

A common recommendation is to create indexes on columns with high cardinality (many different values) that are often used in predicates. With a composite index (index on multiple columns), if we decide to include a column with low cardinality, it shouldn’t be the first one in the index.

What I’m going to do here, is to give an example where these rules simply don’t work, and the exact opposite approach is the solution. This is why the answer for every question is “it depends” (as Tom Kyte always said) and the reason we need to really understand the query and the data before we can optimize a query.

The scenario

In this example I will use a table called tab, it has the ID, NAME, INSERT_DATE and STATUS columns. It has 1m rows, each with a different date (I started with Jan 1st 2016 and each row is 1 hour later than the previous one). The STATUS column is 0 in all rows except the last 10 ones.

This is the creation and population script:

create table tab
(id number, description varchar2(100),
insert_date date,
status number);

declare
   v_desc varchar2(100):='Description of item number ';
   v_date date;
begin
   v_date:=to_date('01-JAN-2016 00:00:00','DD-MON-YYYY HH24:MI:SS');
   for i in 1..1000000
   loop
      insert into tab(id,description,insert_date,status)
      values(i,v_desc||i,v_date,0);
      v_date:=v_date+1/24;
   end loop;
   v_date:=v_date-10/24;
   update tab set status=1 where insert_date>=v_date;
   commit;
end;
/

exec dbms_stats.gather_table_stats(user,'TAB');

The queries against this table include the INSERT_DATE and STATUS, so there is a need to create an index on these columns. As stated before, usually the column with low cardinality shouldn’t be the first one, but I would still like to index the STATUS column, so the index will be a composite index on INSERT_DATE first and then STATUS.

create index tab_date_status_idx on tab(insert_date,status);

However, the query does not contain a simple equal predicate, but it retrieves a specific status for rows that are later than a specific date. The application also uses paging so there is a ROWNUM limit and the query looks like this:

select *
from (
   select *
   from tab
   where insert_date>to_date('01-FEB-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')
   and status=0
   order by insert_date)
where rownum<=10;

Let’s see how this query performs:

<pre>Execution Plan
----------------------------------------------------------
Plan hash value: 1601125972

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     5 |   435 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                     |       |       |            |          |
|   2 |   VIEW                        |                     |     5 |   435 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TAB                 |     5 |   245 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | TAB_DATE_STATUS_IDX |       |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=10) 4 - access("INSERT_DATE">TO_DATE(' 2016-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"STATUS"=0 AND "INSERT_DATE" IS NOT NULL)
filter("STATUS"=0)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
6  consistent gets
0  physical reads
0  redo size
1272  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

As we can see, the query uses the index for both columns, performs 6 consistent gets and has great performance. Let’s see what happens when we ask for rows where STATUS=1. This is the query:

select *
from (
   select *
   from tab
   where insert_date>to_date('01-FEB-2016 00:00:00','DD-MON-YYYY HH24:MI:SS')
     and status=1
   order by insert_date)
where rownum<=10;

And these are the plan and statistics:


Execution Plan
----------------------------------------------------------
Plan hash value: 1601125972

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |    10 |   870 |  2848   (2)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                     |       |       |            |          |
|   2 |   VIEW                        |                     |   211K|    17M|  2848   (2)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TAB                 |   211K|     9M|  2848   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | TAB_DATE_STATUS_IDX |    10 |       |  1260   (2)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=10) 4 - access("INSERT_DATE">TO_DATE(' 2016-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"STATUS"=1 AND "INSERT_DATE" IS NOT NULL)
filter("STATUS"=1)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
2935  consistent gets
0  physical reads
0  redo size
1313  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

It still uses the index for both columns, but it is doing almost 3000 consistent gets! Why?

Let’s think how the index looks like. It has two columns, the first is INSERT_DATE and the second is the STATUS. That means that the index is ordered first by INSERT_DATE and then ordered by STATUS as a secondary sort. When we query INSERT_DATE>x and STATUS=y Oracle starts at INSERT_DATE=x and scan the index from this point onward. It cannot “jump” to STATUS=y because of the”larger than” predicate on the date (if it was INSERT_DATE=x and STATUS=y it could get to the correct place of the index immediately).

The STATUS values are in the index, so we don’t need to read any block of the table yet, but Oracle still scans the index until it gets what it needs. In the case where STATUS=0 it will find 10 rows very quickly (the first 10 rows) so it will be fast and efficient. However, in the case of STATUS=1 it will have to scan the index from the first entry where INSERT_DATE=x until it finds 10 rows where STATUS=1 and these are the last 10 rows. That would take quite a long time and effort.

Now that we understand what Oracle does, the solution is relatively simple. Against the “rule of thumb” of indexes, we will drop the existing index and create a one where the STATUS is the first column and the INSERT_DATE is the second. With such an index, when STATUS=0 Oracle will start with STATUS=0 and INSERT_DATE=x and will find the rows quickly as before. But with STATUS=1 it will also be able to get to STATUS=1 and INSERT_DATE=x and start from there (so it will find the rows quickly as well).

With the new index, these are the plan and statistics where STATUS=0:


Execution Plan
----------------------------------------------------------
Plan hash value: 1601125972

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     5 |   435 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                     |       |       |            |          |
|   2 |   VIEW                        |                     |     5 |   435 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TAB                 |     5 |   245 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | TAB_DATE_STATUS_IDX |       |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=10) 4 - access("STATUS"=0 AND "INSERT_DATE">TO_DATE(' 2016-02-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "INSERT_DATE" IS NOT NULL)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
6  consistent gets
0  physical reads
0  redo size
1272  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

These are the plan and statistics where STATUS=1:

Execution Plan
----------------------------------------------------------
Plan hash value: 1601125972

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |    10 |   870 |  2219   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                     |       |       |            |          |
|   2 |   VIEW                        |                     |   211K|    17M|  2219   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TAB                 |   211K|     9M|  2219   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | TAB_DATE_STATUS_IDX |    10 |       |   631   (2)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=10) 4 - access("STATUS"=1 AND "INSERT_DATE">TO_DATE(' 2016-02-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "INSERT_DATE" IS NOT NULL)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
6  consistent gets
0  physical reads
0  redo size
1313  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

Summary

There are many different but similar scenarios to what I showed here, and this example shows very clearly the importance of understanding the query and what Oracle does behind the scenes, in order to come up with a solution.

Advertisements

2 thoughts on “Scanning an Index

  1. Nice read, thanks.
    Just for the sake of sharing, I happen to be watching a webinar from Karen , very interesting (well starts also with basics) :
    Effective Database Indexing: How to leverage Database Indexes for Optimal Performance
    Embarcadero Technologies

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s