Thursday, 29 December 2016

Fixing Popular Posts Margin on Blogger

I use the Blogger platform for this blog, and I recently added the "Popular Posts" widget to the sidebar. Unfortunately it did not display correctly, with the first character or two of each blog post title being lost and chopped off, as if the whole thing had been shifted to the left for some reason. Here is how I fixed it to display properly.

After a lot of reading up on HTML and CSS I realised that these posts appeared within a section with an associated style name, in this case a class. In turn I could add an entry to the HTML template for my blog to define a shift to the right for this particular class style so that it would display properly. And this shift would only be applied to this section of the blog page and no other.

What this boils down to is the following:
  • The "Popular Posts" widget reference in the HTML has an attribute of "id='PopularPosts1'"
  • The list of posts within this widget has two class attributes, one of which is "popular-posts"
  • I could shift the list of posts far enough across to the right using the style property of "padding-left"
To apply this to your Blogger blog template do the following:
  • In Blogger click on "Template" on the left hand side of your blog dashboard (under Layout and above Settings)
  • Click the on "Edit HTML" button under the small image of your blog
  • Expand the "
    <b:skin></b:skin>
    " section near the top by clicking on it or on the right pointing triangle on the left hand side
    • This was at line 7 on my template
  • Scroll down to the end of this section, which finishes with
]]></b:skin>
  • Above this on their own lines add the following:
#PopularPosts1 .popular-posts {
 padding-left: 15px
}
  • Then click on the "Save Template" button at the top of the screen
  • If you now redisplay your blog you should see the "Popular Posts" list has now been shifted over to the right and aligned nicely under the "Popular Posts" heading
Note the following:
  • The first line states that this style is only to be used within a "popular-posts" class element that itself occurs within a "PopularPosts1" ID element.
    • This should match exactly the content section we want to be shifted across in the blog page
  • The second line sets the left hand side padding before the displayed content to be "15px" (pixels)
    • The value of "15px" was obtained by simple trial and error starting with smaller values until the list was shifted over enough.
I'm not saying my solution is perfect or absolutely correct in any way, I'm just saying that it works for me, and it seems to conform to the various ways CSS works and how Blogger defines its page sections. It would seem that all of the different sidebar elements have a right shift built into them, except for Popular Posts of just titles (no snippets or thumbnails). This solution adds in such a right shift so the blog post titles are now all visible in the blog page.

Monday, 12 December 2016

Announcing Bottleneck Data Solutions

SPOILER ALERT: This particular blog post is a blatant self promotion for me and the various Oracle database services I am offering.

I may not have mentioned it explicitly before but I have left the corporate world of permanent employment and have gone independent to offer my services direct to clients (at reasonable rates, of course). I'm doing this through my newly created company Bottleneck Data Solutions, which has its own associated, obligatory web site.

Basically I am offering my many years of knowledge, experience and expertise on Oracle database design and performance tuning directly to clients for short term consulting engagements or longer term contracts. I'm not big on doing the self promotion thing unfortunately, but I won't succeed unless I get the message out and publicise myself one way or another. So this is a one-off post to raise awareness of what I'm offering through Bottleneck Data Solutions and make you aware that I'm available for Oracle database performance tuning, database design and development engagements.

I have also uploaded some of the SQL Scripts that I commonly use, to share with everyone. Nothing revolutionary I am afraid, just the standard "free space" and "system activity summary" type reports.

At the moment I am offering to do a free, initial Performance Review of an AWR Report. If I can spot anything obvious affecting performance in the AWR Report then I'll provide feedback to you.

Why "Bottleneck Data Solutions"? Well data is everywhere, and is the lifeblood of the modern company. But I have seen a significant number of cases where slow application performance was caused by a badly designed and implemented database. In other words, the way the data was organized and stored in the database was becoming the bottleneck, limiting how quickly data could be supplied to the application software. Solving this kind of bottleneck requires a mix of short term performance tuning to deal with the most obvious hot spots, and a longer term review of the database and redesigning the most critical parts.

Thank you for your patience. The normal service of technical blog posts will be resumed again in the future.

Thursday, 1 December 2016

When 2 Queries Are Better Than 1

A general rule of thumb with queries in Oracle is to use just one query when you can and leave it to the Optimizer to work out the best possible execution plan i.e. decomposing one query into multiple separate queries can end up making the database do more work than it needs to. I came across a case the past week where the opposite was true - making Oracle do separate "smaller" queries was far quicker than using one overall query.

Background

Imagine that we have two tables - new data arrives in one table (INCOMING) and we process it in some way and insert the resultant data into another table (OUTGOING). Data records in both tables have a timestamp on them for when they were created, and they are inserted in timestamp order. We want to process new messages at regular intervals adding them to the OUTGOING table, but they are not deleted from the INCOMING table when processed. Instead a separate purge job runs each night to delete incoming data older than 48 hours. In this scenario we can use the timestamp on each record to work out the maximum timestamp of the OUTGOING data records we have processed so far, and only retrieve INCOMING data records that have been added since then.
select max (ts) from outgoing ;
An additional complexity in this is that the INCOMING data records fall into different categories or types, and these are processed separately to each other due to different processing rules and requirements. This means that when getting the maximum timestamp from the OUTGOING data set we also need to restrict on the category.
select max (ts) from outgoing 
where category = 'CATEGORY1' ;
But it also turns out that some categories are similar enough to each other that they can be processed together at the same time from INCOMING, which should be more efficient that executing multiple separate queries against what can be a large input table with historical data in it. This means we now need the maximum timestamp from OUTGOING across a number of categories. This is most obviously done with this query.
select max (ts) from outgoing 
where category in ( 'CATEGORY1' , 'CATEGORY2', 'CATEGORY3' ) ;
This is valid because these 3 categories are all extracted from INCOMING at the same time and processed together, so we do want the maximum timestamp across all of them together.

The Problem

When executed this "SELECT MAX (TS) WHERE CATEGORY IN (...)" query took a long time to execute (tens of seconds). Previous tests had shown that Oracle can efficiently use an index to get a minimum or maximum value from a table (sub-second), so why was this particular query taking so much longer?

The conclusion was that a single category maximum or minimum is executed differently to one on a group of categories, and this was the cause of the longer execution time. Furthermore, rewriting the query as a union of separate single category queries was much faster (over 100 times faster in this case!) and almost as fast as the original single category query.

Test Environment

This happened for real on Oracle 12c, on a non-CDB database. I'm going to replicate it on a separate test environment, also on Oracle 12c on a non-CDB database, running on Oracle Linux 7.
SQL> select * from v$version ;
BANNER                                                                           CON_ID
---------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production          0
PL/SQL Release 12.1.0.2.0 - Production                                                0
CORE        12.1.0.2.0        Production                                              0
TNS for Linux: Version 12.1.0.2.0 - Production                                        0
NLSRTL Version 12.1.0.2.0 - Production                                                0
I have taken a copy of the data in DBA_OBJECTS which in my test instance had over 90,000 rows in it occupying 12 MB of disk storage, and then doubled this 9 times to be 512 times larger than the original data set, giving over 45 million rows using over 5 GB of disk storage.

Initially I thought about using the CREATED column for the timestamp, and the OBJECT_TYPE for the category, but this ran into problems. First is that the doubling the rows each time does not change any data values, which means that we have very few different, distinct values for CREATED and each value repeats many times. This has an impact on calculations the Optimizer makes and the costs it estimates. Also the real data type in the real system was TIMESTAMP and not DATE. So I decided to add an extra column TS to the test table, and to update this later on to be a near-unique value. By giving the column an initial value it means that space is allocated within each record in each block in Oracle, and the later update should not cause any more space to be allocated or rows to be moved between blocks.

My test system has only 2 GB for UNDO, which means that I cannot do an UPDATE on a table larger than this. So what I have done is double the table 7 times to just over 1 GB in size, do the UPDATE on the TS to a near-unique random value, then double the table another 2 times. This means that each TS value occurs 4 times, which I hope will still be good enough for the tests I am doing.

In terms of the spread of values, in the real system the incoming data is purged every day, deleting data records older than 48 hours. The UPDATE is using the current system timestamp value, offset by up to 48 hours as a value in seconds.
create table outgoing
tablespace testdata
as select * from dba_objects ;
--
alter table outgoing modify (owner not null, object_name not null,
object_id not null, object_type not null, created not null) ;
--
-- Try and make the data loading go faster by minimising redo logging
alter table outgoing nologging ;
--
alter table outgoing add (ts timestamp default systimestamp not null) ;
--
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
--
update outgoing
set ts = systimestamp - numtodsinterval (dbms_random.value (0, 60*60*24*2), 'second') ;
commit ;
--
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
--
select count (*) cnt , count (distinct ts) ts_cnt from outgoing ;
--
select bytes / (1024 * 1024) mb
from user_segments
where segment_name = 'OUTGOING' ;
This produced 46,384,640 rows occupying 6.3 GB of disk storage. I also checked the minimum and maximum values of the timestamp column, and they did cover the past 48 hours (29-NOV-16 14.56.36.872969 to 01-DEC-16 14.56.36.863852).

Test Queries & Execution Plans

In the real system there are less than 10 categories, and the processing involves picking some of the ones that occur the most frequently. I'll use the OBJECT_TYPE for the category, and pick the top 3 most frequently occurring values for the test queries:
select object_type, count (*)
from outgoing
group by object_type order by 2, 1 ;

OBJECT_TYPE               COUNT(*)
----------------------- ----------
...
TABLE                      1217536
TYPE                       1296384
INDEX                      2208256
VIEW                       3252224
JAVA CLASS                15777792
SYNONYM                   18917376
So our test categories will be the bottom 3.

Let's create an index on the category and timestamp columns, again just like the real system:
create index ix_outgoing on outgoing (object_type, ts)
tablespace testdata ;
And let's make sure our statistics on this table are up to date:
exec dbms_stats.gather_table_stats ('JOHN', 'OUTGOING')
When I ran these test queries and then displayed the execution plans the costs shown seemed wrong for the second query of multiple categories. The reported cost was too low given the much higher elapsed time and what I knew it was doing during the execution. To show the real elapsed time I have done "set timing on" in SQL*Plus, and I have run the queries with the "gather_plan_statistics" hint, and used extra options to get the run time execution statistics for the query execution. This will show more about what really happened during the execution.

And to try and make things more even and comparable between the queries I also flushed the buffer cache and shared pool between the execution of each query (done from another SQL*Plus session connected as SYSDBA).

A single maximum for one query has the following execution plan:
SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing where object_type = 'SYNONYM' ;

MAX(TS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635

Elapsed: 00:00:00.14

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID 5mxhp3mh3gzmj, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (ts) from outgoing 
where object_type = 'SYNONYM'

Plan hash value: 856418741

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |      1 |        |     4 (100)|          |      1 |00:00:00.01 |       4 |      4 |
|   1 |  SORT AGGREGATE              |             |      1 |      1 |            |          |      1 |00:00:00.01 |       4 |      4 |
|   2 |   FIRST ROW                  |             |      1 |      1 |     4   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("OBJECT_TYPE"='SYNONYM')

The multi-category query has the following execution plan:
SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing 
     where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' ) ;

MAX(TS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635

Elapsed: 00:00:36.98

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID anb3s44frw6xa, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (ts) from outgoing 
where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' )

Plan hash value: 2754621723

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name        | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows | A-Time     | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |      1 |        |     4 (100)|          |      1 |00:00:36.87 |     169K|    169K|
|   1 |  SORT AGGREGATE    |             |      1 |      1 |            |          |      1 |00:00:36.87 |     169K|    169K|
|   2 |   INLIST ITERATOR  |             |      1 |        |            |          |     37M|00:02:23.89 |     169K|    169K|
|*  3 |    INDEX RANGE SCAN| IX_OUTGOING |      3 |      1 |     4   (0)| 00:00:01 |     37M|00:01:02.14 |     169K|    169K|
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("OBJECT_TYPE"='JAVA CLASS' OR "OBJECT_TYPE"='SYNONYM' OR "OBJECT_TYPE"='VIEW'))

The rephrased query using 3 separate per-category queries is:
SQL> select /*+ gather_plan_statistics */ max (maxts) from (
     select max (ts) maxts from outgoing where object_type = 'SYNONYM'
     union all
     select max (ts) maxts from outgoing where object_type = 'JAVA CLASS'
     union all
     select max (ts) maxts from outgoing where object_type = 'VIEW'
     ) ;

MAX(MAXTS)
---------------------------------------------------------------------------
01-DEC-16 14.56.36.841635

Elapsed: 00:00:00.15

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID 1yv5abfvs46y7, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max (maxts) from ( 
select max (ts) maxts from outgoing where object_type = 'SYNONYM' 
union all select max (ts) maxts from outgoing where object_type = 'JAVA CLASS' 
union all select max (ts) maxts from outgoing where object_type = 'VIEW' )

Plan hash value: 2647832233

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name        | Starts | E-Rows | Cost(%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |      1 |        |    12(100)|          |      1 |00:00:00.04 |      12 |      9 |
|   1 |  SORT AGGREGATE                 |             |      1 |      1 |           |          |      1 |00:00:00.04 |      12 |      9 |
|   2 |   VIEW                          |             |      1 |      3 |    12  (0)| 00:00:01 |      3 |00:00:00.04 |      12 |      9 |
|   3 |    UNION-ALL                    |             |      1 |        |           |          |      3 |00:00:00.04 |      12 |      9 |
|   4 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.01 |       4 |      4 |
|   5 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|*  6 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|   7 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.01 |       4 |      3 |
|   8 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      3 |
|*  9 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      3 |
|  10 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.02 |       4 |      2 |
|  11 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.02 |       4 |      2 |
|* 12 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.02 |       4 |      2 |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_TYPE"='SYNONYM')
   9 - access("OBJECT_TYPE"='JAVA CLASS')
  12 - access("OBJECT_TYPE"='VIEW')

Conclusions

The single category query executes in under 1 second (0.14 seconds on my test system), while the multiple category query using "IN" takes over 30 seconds (36.98 seconds) which is much slower (over 100 times slower for elapsed time).

We can see that the single category query uses an optimized index access to just get the maximum timestamp value - "INDEX RANGE SCAN (MIN/MAX)" combined with a "FIRST ROW" operation in the execution plan. However the multiple category query using "IN" does not do this and instead does an "INDEX RANGE SCAN" combined with a "INLIST ITERATOR" which takes far longer to execute (as seen from the A-Time value in the reported execution plan and the 37M for A-Rows).

The solution is to rewrite the query as a "UNION ALL" of 3 separate single category queries. We can safely use "UNION ALL" because the categories are distinct from each other. This query executes in under a second which is almost as fast as for just one category, and uses the optimized "INDEX RANGE SCAN (MIN/MAX)" access method combined with the "FIRST ROW" operation.

Sometimes it can be quicker to run separate SELECT queries and merge the results together than to run one single query that does it all at once. Sometimes it can even be 100 times faster to run separate queries than one single merged query.

Thursday, 20 October 2016

AWR Summary Data Extracts

A long time ago (in a galaxy far away) I wrote a series of blog posts about directly extracting useful sets of data from the AWR snapshots in the DBA_HIST tables for subsequent analysis and graphing using things like Excel. This post is to summarise where I got to on this, and provide links back to the main posts for future reference.

A guiding principle of my approach was that I wanted to avoid the "mother of all AWR queries" that would extract every possible data value from all DBA_HIST tables for all snapshots, as it would be a really big and complex query (meaning difficult to debug if a mistake was introduced anywhere) and it could use up a lot of system resources and take some time to finish each time it was executed. Instead I prefer the smaller, targeted type of query that extracts just what you need from the minimum number of AWR tables, because it is simpler to understand and will be quicker to execute. That said, you could extend my approach of one sub-query per DBA_HIST table and include all of the main AWR snapshot tables into one single query, and then create a view using it. This would allow you to write simpler analysis queries later on against the view, without needing to need to know the details of all the various sub-queries and DBA_HIST tables.

Along the way I covered the following main topics in the blog posts:-
Although I did not post a query that merged all the different component sub-queries together, I did post a minimal Summary AWR Query at the end to show how you could use the basic AWR data in practise to look at the overall performance and behaviour of a system. By extending the main SELECT of the final query itself, you can add extra columns of data you want into the output, without any changes to the various component sub-queries (the "with" or Common Table Expressions). To put it another way, if there is some extra AWR data you want extracted then you need to work out which of the main data sets it falls under, and then which columns or rows in that AWR data set the data you want is stored in.

For instance here is another variation of the AWR Summary Query that also extracts the number of SQL statements executed, the separate wait times for User I/O and Cluster waits (normalised to a percentage of the total wait time in the snapshot), physical reads (normalised to a per second value), average disk read and redo write times.
/**
 * Name      : awr_sum_lite
 * Purpose   : Minimal one line summary of each AWR snapshot from a day
 * Descrip'n : Based on awr_sum_range with unneeded stuff deleted
 *             Only key essential stuff is output:
 *                 SQL executions / sec, #User calls / sec
 *                 Average Active Sessions, Wait % of Busy time
 *                 Cluster Wait % of wait time, User I/O % of wait time
 *                 Avg disk read time, Avg redo write time, #Disk reads / sec
 *             Values directly calculated, rather than raw underlying values
 * Usage     : @awr_sum_lite
 * Assumption: Want the last 4 whole days
 *               Which is from midnight 4 days ago to last midnight
 * To Do     : 
 */
--
set feedback off
set heading off
set newpage none
set verify off
--
set linesize 1000
set pages 28
set trimout on
set trimspool on
--
select ' ' from dual ;
select '                 AWR Lite Summary Report' from dual ;
select '                 =======================' from dual ;
-- select ' ' from dual ;
-- select ''
--    || '<= Database Time =>'
--    || '<= System Statistics ==>'
--    || '<== Waits ================>'
--  from dual ;
set newpage 1
set heading on
--
col snap_time           format a15      heading 'SNAP TIME'
col aas                 format 90.0     heading 'AAS'
col wait_pct            format 990.0    heading 'WAIT%'
col cluster_pct         format 90.0     heading 'CLUS%'
col user_io_pct         format 90.0     heading 'UIO%'
col executions_sec      format 9,990    heading 'EXEC/S'
col user_calls_sec      format 9,990    heading 'UCALL/S'
col physical_reads_sec  format 9,990    heading 'READS/S'
col avg_disk_read       format 990.0    heading '(MS)|AV RD'
col avg_redo_write      format 990.0    heading '(MS)|REDO W'
--
--
with
snaps as 
(select snap_id
      , dbid
      , end_snap_time
      , snap_interval
      , extract (second from snap_interval) 
       + (extract (minute from snap_interval) 
          + (extract (hour from snap_interval)
             + (extract (day from snap_interval) * 24)
             ) * 60
          ) * 60 snap_duration
  from (select csnaps.snap_id
             , csnaps.dbid
             , min (csnaps.end_interval_time) end_snap_time
             , min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
          from dba_hist_snapshot csnaps
         group by csnaps.snap_id, csnaps.dbid
       )
) -- snaps
, systimes as 
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
      , systime.dbid
      , systime.stat_name
      , sum (systime.value - psystime.value) value
         from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
        where systime.snap_id = psystime.snap_id + 1
          and systime.dbid = psystime.dbid
          and systime.instance_number = psystime.instance_number
          and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
        group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
      , sysstat.dbid
      , sysstat.stat_name
      , sum (sysstat.value - psysstat.value) value
         from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
        where sysstat.snap_id = psysstat.snap_id + 1
          and sysstat.dbid = psysstat.dbid
          and sysstat.instance_number = psysstat.instance_number
          and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
        group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats
, syswaits as 
-- One row for total wait time, plus break down into major wait classes, and events
(select syswaitevents.snap_id
      , syswaitevents.dbid
      , sum (syswaitevents.time_waited_micro) all_wait_time
      , sum (decode (syswaitevents.wait_class, 'Commit',      syswaitevents.time_waited_micro, 0)) commit_time
      , sum (decode (syswaitevents.wait_class, 'Cluster',     syswaitevents.time_waited_micro, 0)) cluster_time
      , sum (decode (syswaitevents.wait_class, 'Concurrency', syswaitevents.time_waited_micro, 0)) concurrency_time
      , sum (decode (syswaitevents.wait_class, 'Network',     syswaitevents.time_waited_micro, 0)) network_time
      , sum (decode (syswaitevents.wait_class, 'System I/O',  syswaitevents.time_waited_micro, 0)) system_io_time
      , sum (decode (syswaitevents.wait_class, 'User I/O',    syswaitevents.time_waited_micro, 0)) user_io_time
   from 
        (select sysevent.snap_id
              , sysevent.dbid
              , sysevent.wait_class
              , sysevent.event_name
              , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
              , sum (sysevent.total_waits - psysevent.total_waits) wait_count
           from dba_hist_system_event sysevent, dba_hist_system_event psysevent
          where sysevent.snap_id = psysevent.snap_id + 1
            and sysevent.dbid = psysevent.dbid
            and sysevent.instance_number = psysevent.instance_number
            and sysevent.event_id = psysevent.event_id
            and sysevent.wait_class != 'Idle'  -- Ignore Idle wait events
          group by sysevent.snap_id
                 , sysevent.dbid
                 , sysevent.wait_class
                 , sysevent.event_name
        ) syswaitevents
  group by syswaitevents.snap_id
         , syswaitevents.dbid
) -- syswaits
-- Average Active Sessions, Wait % of Busy time
-- Cluster Wait % of wait time, User I/O % of wait time
-- SQL executions / sec, #User calls / sec
-- Avg disk read time, Avg redo write time, #Disk reads / sec
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
     , (user_calls_st.value / snaps.snap_duration)              user_calls_sec
     , (execs.value / snaps.snap_duration)                      executions_sec
     , (dbtime.value / 1000000)       / snaps.snap_duration     aas
-- If database active time is 1% of duration time or less ignore wait (0)
     , case when (dbtime.value / (1000 * snaps.snap_duration) ) > 1
            then (100 * syswaits.all_wait_time  / dbtime.value)
            else 0.0
       end wait_pct
     , (100 * syswaits.cluster_time   / syswaits.all_wait_time) cluster_pct
     , (100 * syswaits.user_io_time   / syswaits.all_wait_time) user_io_pct
     , (phys_reads.value / snaps.snap_duration)                 physical_reads_sec
     , (syswaits.user_io_time / phys_reads.value) / 1000        avg_disk_read
     , (redo_time_st.value * 10 / redo_write_st.value)          avg_redo_write
  from snaps
     join (select * from systimes where stat_name = 'DB time') dbtime
       on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
     join syswaits
       on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
     join (select * from sysstats where stat_name = 'execute count') execs
       on snaps.snap_id = execs.snap_id and snaps.dbid    = execs.dbid
     join (select * from sysstats where stat_name = 'user calls') user_calls_st
       on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
     join (select * from sysstats where stat_name = 'redo writes') redo_write_st
       on snaps.snap_id = redo_write_st.snap_id and snaps.dbid  = redo_write_st.dbid
     join (select * from sysstats where stat_name = 'redo write time') redo_time_st
       on snaps.snap_id = redo_time_st.snap_id and snaps.dbid  = redo_time_st.dbid
     join (select * from sysstats where stat_name = 'physical reads') phys_reads
       on snaps.snap_id = phys_reads.snap_id and snaps.dbid    = phys_reads.dbid
 where snaps.end_snap_time between 
       (trunc (sysdate) - 4) and (trunc (sysdate))
 order by snaps.end_snap_time
/
--
set feedback on
set lines 80
This query provides what I believe are a good, minimal set of useful measurements of activity and relative performance of an Oracle database. It allows easy viewing of relative activity (Average Active Sessions, User Calls / sec), waiting time as a percentage of database time, the main types of wait occurring (User I/O and Cluster), and some key disk I/O measurements (physical reads / sec, average disk read time, average redo write time). From these you can see how active the system is, whether any significant waits are occurring, and if they are I/O related, and see how these change over time between AWR snapshots.

Of course everyone has a different opinion of what the most important measurements are on an Oracle database. My key message here is that the approach I have followed lets you customise these queries to report out the performance and activity measurements you want from the data captured in the AWR snapshots. So if you want a different set of data values reported out, then it should be relatively easy to modify these queries to get what you want.

Wednesday, 30 March 2016

Full Table Scan not always as low as 0.5% of data!

Based on a reply from Jonathan Lewis to an OTN post on Explain Plans I need to correct some of the claims made in my previous post on Full Table Scans.

I'm going to repeat the inaccurate paragraphs and the Conclusions I posted then [edited here for brevity], and then point out the mistakes in them, and an attempt at a corrected set of paragraphs. I will also edit the original post and replace these incorrect paragraphs with the corrected ones.

I'll highlight the parts that are not strictly correct.
The main determinant for whether a FTS is the best access method is the fraction or percentage of the rows in the table being retrieved by the query, which are of course determined by the filter conditions in the query. If this fraction / percentage of data from the table is high enough then a Full Table Scan will always be a lower cost than using an Index Scan (except maybe for an Index Fast Full Scan, which is really another variation on a Full Scan).

So when you see a Full Table Scan in an execution plan you should check the query itself for how many rows it needs from the total number of rows in the table i.e. what percentage or fraction. If this is high enough then a FTS is indeed the "best" access method to get that data because it has the lowest cost. In which case you should really be asking why the query needs such a high percentage of the data from one table. Don't assume that the Optimizer is wrong - normally it isn't. It is just as likely to be your query that is affecting the execution plan chosen.

Conclusions

Although a Full Table Scan can seem a "brute force" approach to finding some matching records in a table, it can sometimes be the better way of doing it though. It all depends on how many rows you want back from the table as a percentage of the rows in the table.

If your query is retrieving more than about 0.20% to 0.35% of the data in a table then a Full Table Scan may well be the cheapest and best access method. [...]. Trying to force the Optimizer into using another access method in this circumstance is a waste of time, because all other access methods will be more expensive. The only exception might be another Full Scan type access, such as an Index Full Scan. But even then the gains (reduction in cost) will only be marginal i.e. not a full order of magnitude less.

When you see a FTS in a query execution plan you should check the estimated row count and what this is as a percentage of the row count in the table, and confirm if this estimate is correct or not. If the estimate is correct then a FTS is the lowest cost access method and the Optimizer is right to choose it. You should also check if your query is correct, or if there is something wrong with the filter conditions in it.
The main error is that it is wrong for me to claim any kind of actual figures for the percentage when the cutover will occur. As Jonathan points out in his reply on OTN it is possible for an index access to have a lower cost than a FTS for 24.5% of the data in a table, which is far above the 0.35% figure I mention. The actual cutover percentage is dependent on so many variables that it is incorrect for me to try and state a specific range for the cutover.

While I was not necessarily wrong in my specific examples because I made clear what my assumptions were, I was also not right in all possible scenarios. So I'm withdrawing my original Conclusions and updating them to be more strictly correct.
[paragraph deleted]

So when you see a Full Table Scan in an execution plan you should check the query and all other relevant factors to see if a FTS is indeed the "best" access method to get that data because it has the lowest cost. [deleted sentence] Don't assume that the Optimizer is wrong - normally it isn't. It is just as likely to be your query that is affecting the execution plan chosen.

Conclusions

Although a Full Table Scan can seem a "brute force" approach to finding some matching records in a table, it can sometimes be the better way of doing it though. It depends on several factors including how many rows you want back from the table, the number of blocks for the table, and the Clustering Factor of any indexes.

It is possible that for even low percentages of data being retrieved from a table that a Full Table Scan can be the cheapest and best access method. [...]. Trying to force the Optimizer into using another access method in this circumstance is a waste of time, because all other access methods will be more expensive. The only exception might be another Full Scan type access, such as an Index Full Scan. But even then the gains (reduction in cost) will only be marginal i.e. not a full order of magnitude less.

When you see a FTS in a query execution plan you should check many things including, but not limited to, the number of rows in the table, the number of blocks used for the table, the Clustering Factor for each possible index, and the estimated row count for the filters being used.

If the estimate is correct then a FTS is the lowest cost access method and the Optimizer is right to choose it. You should also check if your query is correct, or if there is something wrong with the filter conditions in it.

Counter Example

It is relatively simple to show a similar query to the ones I was using that uses an index instead of a full table scan for a far higher percentage of data in the table than 0.5%. Jonathan posted one such counter example in one of his replies on OTN, and we can easily replicate this using the same test data set from my original post.

select sum (one) from scantest where pkid between 1000000 and 2000000 ;

  SUM(ONE)
----------
   1000001

Statistics
----------------------------------------------------------
      10468  consistent gets
      10311  physical reads
   2  SQL*Net roundtrips to/from client
   6  sorts (memory)
   0  sorts (disk)
   1  rows processed

select sum (one) from scantest where pkid between 1000000 and 2000000

Plan hash value: 40453105

-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       | 10282 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST     |  1000K|  8789K| 10282   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PKID |  1000K|       |  2223   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PKID">=1000000 AND "PKID"<=2000000)
Note:
  • This query is retrieving 10% of the data in the table
  • An index range scan is being used to retrieve the data
  • A Full Table Scan had a cost of just over 22,000, so this index scan is much cheaper at 10,282
  • The key difference is the Clustering Factor of the pkid column
This index range scan is still cheaper when retrieving 20% of the data in the table:

select sum (one) from scantest where pkid between 1000000 and 3000000 ;

  SUM(ONE)
----------
   2000001

Statistics
----------------------------------------------------------
      20762  consistent gets
      20697  physical reads
   2  SQL*Net roundtrips to/from client
   6  sorts (memory)
   0  sorts (disk)
   1  rows processed

select sum (one) from scantest where pkid between 1000000 and 3000000

Plan hash value: 40453105

-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       | 20562 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST     |  2000K|    17M| 20562   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PKID |  2000K|       |  4442   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PKID">=1000000 AND "PKID"<=3000000)

Conclusion?

Be careful of generalising and summarising certain observed behaviours too much with Oracle. When considering index usage there is much more to it than just the number of rows being retrieved as a percentage of those in the table.

Monday, 21 March 2016

Creeping Slow Performance

A recent OTN post requesting help for a slow performing update where Oracle was doing a full table scan, also revealed that the update was getting slower and slower each month, and that new data was loaded into the table each month to be "processed" i.e. updated. Unfortunately this kind of design suffers from a natural creeping slowdown in the elapsed time of the update process. Each month it will take longer and longer to complete the processing of the newly loaded data, until any elapsed time targets are exceeded and it continues to take longer and longer to complete. There is a bad design pattern here, but obviously the original designers and implementers did not spot it.

Consider the following - in the first month after go-live the table only contains newly loaded data, so a full table scan is used. Let's say it takes 2 minutes to complete, and our target is 30 minutes for the whole process to complete, which includes other steps as well. Everything looks good performance wise, so we leave everything as it is and carry on.

The next month - month 2 - the size of the table has doubled, and Oracle still uses a full table scan because you want to update half the data in the table. This now takes 4 minutes because the table has doubled in size. But the overall process still completes within 30 minutes, so everything still looks good.

Over the next few months the table grows each month, and the elapsed time of the update continues to increase by 2 minutes each month, assuming a similar volume of data is loaded in each month. After 5 months the update now takes 10 minutes, but this is still within our 30 minute target.

But after 10 months the elapsed time is now 20 minutes, which is a significant proportion of the 30 minutes, and people are starting to ask questions. And after 15 months, if not sooner, the 30 minute target has been missed and the overall data load and process is now taking too long. Not only that, it is continuing to take longer and longer to complete each month, and will only get worse.

What can be done? Could an index help? No, is the simple answer. As I recently blogged in Full Table Scan - Friend or Foe?, the cutover point for the Optimizer to choose to use an index is when you want to retrieve less than 0.5% or so of the data in a table. After 10 months you are still processing 10% of the data in the table each month. And worse, even after 100 months (8+ years) you are still processing 1% of the data in the table each month, and a full table scan is still a lower execution cost than using any index on the table.

What could you do about this? One obvious solution is to use a "staging table" of the same structure into which the data is initially loaded each month. Then do the update and other processing on this table, and finally insert all the data into the main table. This solution does not suffer from the constant degradation in performance that the current one does. Oracle will still use a full table scan, but it is only reading from disk the data for this month, and so should complete in a relatively constant time each month - assuming data volumes are similar each month. The insert to the main table could be done quite quickly using the "append" hint for a direct load, and then the staging table could be truncated which should be very quick. Although the data is copied around in Oracle twice, it should still be much quicker than scanning a table with hundreds of months of data in it.

Other possibilities include partitioning by a key unique to each month's data, which is also used in the update statement so that partition pruning occurs. This may be problematic if there is not a clean partition key to use. It should not be a "processed flag", as this would be updated by the processing itself which would result in row movement between partitions and extra work by Oracle.

Ultimately the issue is about spotting these kind of bad design patterns early on and avoiding these pending future performance problems. To get to one month's data being less than 0.5% of the total data volume in the table would take you over 200 months, or almost 17 years. And during those 17 years this monthly data load processing would always take longer and longer to complete. A different design can avoid this future performance problem completely.

Tuesday, 1 March 2016

Full Table Scan - Friend or Foe?

[Or Don't be afraid of Full Table Scans]

UPDATE 30 March 2016
I've come to realise that there are some mistakes and inaccuracies in this post [see this post for more details], so I've edited this post and corrected what I can.  Rather than leave behind incorrect claims, I've replaced them by the corrected text, so hopefully now this post no longer has its previous faults.

ORIGINAL POST:
Many people consider a Full Table Scan (FTS) in a query execution plan to be a bad thing - reading every record from a table to find only those records the query needs. This is indicated by a "TABLE ACCESS FULL" in an execution plan. But is it really all that bad? Is it actually sometimes the right tool for the job? Can a Full Table Scan sometimes have a lower cost than using an Index?

My view is that the Oracle Optimizer will only choose a FTS under the following two conditions:
  • There is no other possible access method to get the needed data from the table
    • No other access method is currently available, but another access method might be a lower cost if it was available
  • A FTS is a lower cost (and so should be faster) than all other possible access methods
    • The FTS is the "best" access method of all available and the Optimizer is correct to use it
In other words, when you see a "TABLE ACCESS FULL" in an execution plan you should not jump to the conclusion that this is wrong and "needs to be fixed", because maybe it is correct and it is indeed the best access method for that particular step of the query operation. If you have concerns about the FTS then you need to double check things to find out whether it is the right thing for the Optimizer to be choosing, or whether a better access method would reduce the execution cost. Jumping to the wrong conclusion can lead you down a dead end when trying to improve a query's performance.

So when you see a Full Table Scan in an execution plan you should check the query and all other relevant factors to see if a FTS is indeed the "best" access method to get that data because it has the lowest cost.  Don't assume that the Optimizer is wrong - normally it isn't. It is just as likely to be your query that is affecting the execution plan chosen.

Calculating the Cutover Point

The actual percentage or fraction of rows in the table that is the cutover point between the Optimizer using a Full Table Scan or an available Index is itself mainly dependent on the average number of rows stored per data block. The other main factor is the value being used by the Optimizer for Multi-Block Read Count (MBRC), which is part of the System Statistics stored in the database. It is possible to calculate this cutover point yourself on a given table to see when a FTS really is cheaper and when an Index might help.

For a specific query involving an equality filter on a column, the other factor is what fraction of the data rows in the table have the same value stored in them. Oracle maintains this as a statistic on each column named "Density". By comparing the Density value for a column against the fraction of rows in the table needed by the query, you can see whether a FTS would be cheaper or not for a filter on that column. Remember that a percentage is just a fractional decimal value multiplied by 100 - so a Density of 0.0025 means 0.25%.

First the value used for MBRC (Multi-Block Read Count). This is stored in the AUX_STAT$ table owned by SYS. Its value only gets set when you gather system statistics. If set, then that particular value is used. If not set then a default value of 8 is used (note that it seems to ignore the value of the initialization parameter db_file_multiblock_read_count). The Optimizer also uses the values for MREADTIM and SREADTIM (Multi-block read time and Single block read time) also in the AUX_STAT$ table. Again, if these have not been set then it will use a default formula to derive them from the values for IOSEEKTIM and IOTFRSPEED.

To try and keep this explanation short we can jump to the following formulae used when you have not gathered system statistics.
SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)  
MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED) 
For the default values of MBRC (8), IOSEEKTIM (10) and IOTFRSPEED (4096) and a DB_BLOCK_SIZE of 8192 (8 KB) you get an SREADTIM value of 12 milliseconds, and a MREADTIM value of 26 milliseconds.

When a Full Table Scan occurs the Optimizer knows that it will be doing multi-block reads. However, it needs to cast or convert the cost of these reads into units of single block reads. This is because all other I/O costs are in terms of single block reads internal to the Optimizer. And the "cost" of a multi-block read is not the same as the "cost" of a single block read - a multi-block read should take longer given the greater number of blocks being transferred.

Instead of dividing the number of blocks in the table by the full MBRC value, it "adjusts" this value by the ratio of SREADTIM to MREADTIM, and then uses this value in the cost calculation.

The cost of a FTS would then be calculated as (note the brackets):
FTS Cost = #Blocks in Table / (MBRC * SREADTIM / MREADTIM)
FTS Cost = (#Blocks in Table * MREADTIM) / (MBRC * SREADTIM)
For an Index lookup to be cheaper than this FTS cost, we can calculate the fraction of rows in the table where the cost of using an Index would be slightly less than this FTS cost. For this we need to know the following values from the statistics Oracle has on the table and the particular column used in the equality filter:
  • Number of Rows in the table - NUM_ROWS in USER_TABLES or USER_TAB_STATISTICS
  • Number of Blocks used for the table - BLOCKS in USER_TABLES or USER_TAB_STATISTICS
  • Density of the column - DENSITY in USER_TAB_COLS or USER_TAB_COL_STATISTICS
A Full Table Scan is cheaper than using an Index lookup when:
Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
When the Density of a column is less than this value then an Index lookup would be cheaper.

Conclusions

Although a Full Table Scan can seem a "brute force" approach to finding some matching records in a table, it can sometimes be the better way of doing it though. It depends on several factors including how many rows you want back from the table, the number of blocks for the table, and the Clustering Factor of any indexes.

It is possible that for even low percentages of data being retrieved from a table that a Full Table Scan can be the cheapest and best access method. Trying to force the Optimizer into using another access method in this circumstance is a waste of time, because all other access methods will be more expensive. The only exception might be another Full Scan type access, such as an Index Full Scan. But even then the gains (reduction in cost) will only be marginal i.e. not a full order of magnitude less.

When you see a FTS in a query execution plan you should check many things including, but not limited to, the number of rows in the table, the number of blocks used for the table, the Clustering Factor for each possible index, and the estimated row count for the filters being used. If the estimate is correct then a FTS is the lowest cost access method and the Optimizer is right to choose it. You should also check if your query is correct, or if there is something wrong with the filter conditions in it.

Tests

Lets show whether this holds true with some tests. There is quite a lot of output here, but I want to provide everything so that anyone else can reproduce these tests on their own systems.

Database version - 12.1.0.2.0
Operating System - Oracle Linux 7.2

Scan Test table - 10 million rows, with columns of different repeated values:
drop table scantest ;
--
prompt Loading data ....
create table scantest 
tablespace testdata 
as 
select r pkid
     , 1 one -- a constant, which forces actual data row access
     , mod (r, 10)     pct10   -- 10 values     = 10% of data in table
     , mod (r, 20)     pct5    -- 20 values     = 5% of data in table
     , mod (r, 50)     pct2    -- 50 values     = 2% of data in table
     , mod (r, 100)    pct1    -- 100 values    = 1% of data in table
     , mod (r, 200)    pct05   -- 200 values    = 0.5% of data in table
     , mod (r, 500)    pct02   -- 500 values    = 0.2% of data in table
     , mod (r, 1000)   pct01   -- 1,000 values  = 0.1% of data in table
     , mod (r, 2000)   pct005  -- 2,000 values  = 0.05% of data in table
     , mod (r, 5000)   pct002  -- 5,000 values  = 0.02% of data in table
     , mod (r, 10000)  pct001  -- 10,000 values = 0.01% of data in table
     , mod (r, 20000)  pct0005 -- 20,000 values = 0.005% of data in table
     , mod (r, 50000)  pct0002 -- 50,000 values = 0.002% of data in table
  from (select rownum r
          from (select rownum r from dual connect by level <= 1000) a,
               (select rownum r from dual connect by level <= 1000) b,
               (select rownum r from dual connect by level <= 1000) c
         where rownum <= 10000000) ;
--
prompt Gathering Statistics ....
exec dbms_stats.gather_table_stats ('JOHN', 'SCANTEST')
--
prompt Creating Indexes ....
create unique index ix_scan_pkid on scantest (pkid) ;
create index ix_scan_pct2 on scantest (pct2) ;
create index ix_scan_pct1 on scantest (pct1) ;
create index ix_scan_pct05 on scantest (pct05) ;
create index ix_scan_pct02 on scantest (pct02) ;
create index ix_scan_pct01 on scantest (pct01) ;
This produces a table with the following statistics:
                                TABLE STATISTICS

Table                           %F  IT           In Ext         Next Ext  %I
------------------------------ --- --- ---------------- ---------------- ---
SCANTEST                        10   1           65,536        1,048,576

                                     Avg Spc
    Num Rows       Blocks E Blocks  Free/Blk Chains Avg Row Len
------------ ------------ -------- --------- ------ -----------
  10,000,000       80,951        0       .00      0      53.000

Table                           Num Extents       Blocks   Avg Blocks
------------------------------ ------------ ------------ ------------
SCANTEST                               151        81,920          543

                                  Leaf         Distinct       Clustering
Index                  Height   Blocks             Keys           Factor
-------------------- -------- -------- ---------------- ----------------
IX_SCAN_PKID                3   22,132       10,000,000           80,528
IX_SCAN_PCT2                3   19,503               50        4,026,368
IX_SCAN_PCT1                3   19,518              100        8,052,718
IX_SCAN_PCT05               3   20,212              200       10,000,000
IX_SCAN_PCT02               3   20,629              500       10,000,000
IX_SCAN_PCT01               3   20,768            1,000       10,000,000

Full Table Scan cost is calculated as follows:
FTS Cost = (BLOCKS * MREADTIM) / (MBRC * SREADTIM)
On my system MREADTIM and SREADTIM are not set, and the others have default values in the system statistics in AUX_STAT$ (IOSEEKTIM = 10, IOTFRSPEED = 4096), so using the formula from before this gives SREADTIM of 12 ms and MREADTIM of 26 ms.

Plugging these values into the previous formula gives about 22,000 as the cost for the FTS:
(80951 * 26) / (8 * 12) = 2104726 / 96 = 21,924.23 
We can calculate the cutover point up to which a FTS would be cheaper than an Index Scan using the formula given before:
Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
Density > (80,951 * 26) / (8 * 12 * 10,000,000) = 0.002192 or 0.22% approximately
Remember that this is based on several assumptions (Index Clustering Factor) and simplifications (ignoring CPU costs). Thus the cutover point will not be a precise value of 0.0022 (0.22%) but something around this value.

Having calculated the FTS cost at about 22,000 we can also calculate the expected Index Scan costs, and see whether they do drop below the FTS cost under 0.22% of the data in the table.

An Index Scan cost has 2 components:
Index Access Cost = Levels + (Leaf Blocks * Filter Factor)
 Data Access Cost = Clustering Factor * Filter Factor
The Filter Factor is the selectivity of a single value for an equality predicate filter, being the Density of the column, or one over the Number of Distinct Values. In real terms it is the percentage of rows being retrieved, which in our test table is indicated by the column name.

For all indexes the number of Levels is 3 and the number of Leaf Blocks is about 20,500. Yes, it does vary but it is not significantly different for the tests we are doing here. Also from index PCT05 onwards the Clustering Factor is always 10 million i.e. it is not clustered and it is the row count in the table.

Column Name% of Rows returnedFraction of RowsIndex CostData CostTotal Cost
pct11%0.013 + (20,500 * 0.01) = 3 + 205 = 2088,000,000 * 0.01 = 80,00080,208
pct050.5%0.0053 + (20,500 * 0.005) = 3 + 102.5 = 105.510M * 0.005 = 50,00050,106
pct020.2%0.0023 + (20,500 * 0.002) = 3 + 41 = 4410M * 0.002 = 20,00020,044
pct010.1%0.0013 + 20.5 = 23.510M * 0.001 = 10,00010,024

Our calculations confirm that the expected cost of an Index Scan should drop below that of a Full Table Scan when less than 0.22% of the data in the table is being selected.

Does this bear out in practise? Will Oracle switch from a Full Table Scan to an Index Scan when the fraction of rows requested drops below 0.0022? Lets see.

Note: The buffer cache and shared pool were flushed from a SYSDBA session before each query execution, so the buffer cache was empty in each case. And we are only interested in the I/O statistics from the query execution - other statistics such as Redo and SQL*Net bytes sent are not relevant and have been removed.

Query 1:
set autotrace on statistics 
--
select sum (one) from scantest where pct2 = 1 ;

  SUM(ONE)
----------
    200000

Statistics
----------------------------------------------------------
      80704  consistent gets
      80541  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed
Execution Plan (from dbms_xplan.display_cursor for the SQL_ID):
select sum (one) from scantest where pct2 = 1

Plan hash value: 1745049784

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |       |       | 22028 (100)|          |
|   1 |  SORT AGGREGATE    |          |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| SCANTEST |   200K|  1171K| 22028   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PCT2"=1)
Note:
  • 2% of 10 million rows is 200 thousand, which the Optimizer has correctly estimated.
  • Full table scan cost is 22,028. This is very close to my own estimate of about 22,000. The missing cost component would be for CPU work to filter each row.
  • 80,541 physical reads occurred, which is very close to the 80,951 blocks reported in the table.

Query 2:
Repeat this with a lower percentage column - pct05
select sum (one) from scantest where pct05 = 1 ;

  SUM(ONE)
----------
     50000

1 row selected.

Statistics
----------------------------------------------------------
      80704  consistent gets
      80541  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

select sum (one) from scantest where pct05 = 1

Plan hash value: 1745049784

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |       |       | 22040 (100)|          |
|   1 |  SORT AGGREGATE    |          |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS FULL| SCANTEST | 50000 |   341K| 22040   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PCT05"=1)
Note:
  • Rows estimate is now 50,000, as expected
  • Full table scan cost is slightly different for some unknown reason, but still about 22,000.
  • Same number of physical reads occurred - 80,541.

Query 3:
Continue down to the next lower percentage column - pct02
select sum (one) from scantest where pct02 = 1 ;

  SUM(ONE)
----------
     20000

1 row selected.

Statistics
----------------------------------------------------------
  20212  consistent gets
  20060  physical reads
      2  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
      1  rows processed

select sum (one) from scantest where pct02 = 1

Plan hash value: 3458310886

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 20049 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 20000 |   136K| 20049   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT02 | 20000 |       |    44   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT02"=1)
Note:
  • Now 0.2% of data in this table, which is low enough that an Index Scan should be cheaper
  • Index cost is indeed lower than a Full Table Scan, as expected - 20,049 versus 22,040
  • Index cost is very close to our own calculation of 20,044 - again CPU cost is the difference
  • Only 20,060 physical reads now - Index branch blocks + Leaf blocks + Data rows

Query 4:
Repeat this with a lower percentage column - pct01
select sum (one) from scantest where pct01 = 1 ;

  SUM(ONE)
----------
     10000

1 row selected.

Statistics
----------------------------------------------------------
      10192  consistent gets
      10038  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

select sum (one) from scantest where pct01 = 1

Plan hash value: 1707962624

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 10025 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 10000 | 70000 | 10025   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT01 | 10000 |       |    23   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT01"=1)
Note:
  • Index cost is halved because row count is halved - 10,025 versus 10,024 calculated
  • Only 10,038 physical reads now - Index branch blocks + Leaf blocks + Data rows

Query 5:
If we force the use of an index for the query on the PCT05 column using a hint we get the following:
select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1 ;

  SUM(ONE)
----------
     50000

Statistics
----------------------------------------------------------
  50271  consistent gets
  50133  physical reads
      2  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
      1  rows processed

select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1

Plan hash value: 3145193111

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 50116 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 50000 |   341K| 50116   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT05 | 50000 |       |   104   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT05"=1)
Note how the estimated cost is 50,116, which is far greater than the 22,028 of a full table scan, and agrees with my earlier calculation of 50,106 for this index. Instead of the 80,000 physical disk reads done before, only 50,000 have been done. However, the 80,000 disk reads were really multi-block disk reads i.e. nearer 10,000 real disk reads would have been done, each for 8 disk blocks at once. The 50,000 disk reads for the index execution plan would be single block disk reads.