Wednesday, 22 November 2017

Chatty Applications and Simple SQL

One type of "poor performance" scenario I have come across a few times is due to what I call "chatty applications". These are applications that execute a disproportionally high number of what look like very simple SQL queries for every business transaction they do. And often this is a deliberate design choice by the application architects and developers, claiming that simpler SQL statements on single tables using indexed columns always leads to efficient execution plans, so it must be better than anything else. Unfortunately it is the high volume of execution of such "simple SQL queries" that is the cause of the poor performance these applications often experience. Both in terms of poor response times for individual users, and poor total throughput across all the users. Why then are such "chatty applications" the cause of poor performance and not the solution?

First, there are network round trips needed for each SQL statement executed, and these are typically milliseconds in terms of order of magnitude of elapsed time. The SQL statement text is sent from the client to the database server, and then the resultant data sent back from the server to the client. And when there are hundreds of SQL statements being executed per business transaction, then these all add up. Executing a "business transaction" in such a "chatty application" can end up spending more time transferring data back and forth over the network than actually doing work with that data on either the client or database server system.

Second, the high SQL execution volume results in high library cache usage and contention for access to it inside Oracle. For each SQL statement executed, Oracle will check if that SQL is already in the library cache (in the shared pool in the SGA) and if so will go ahead and use the associated execution plan. Such library cache look ups involve various temporary internal locks on the library cache, to make sure it doesn't change while your session is examining it. And the more SQL statements being executed leads to more library cache lookups and more contention for the controlling locks on it.

Third, the volume of data transferred to the client and then processed on it can be much higher than if a single, more complex SQL statement was used instead. This is also about network delays, but this time it is due to the amount of intermediate result data being transferred to the client, as it works its way through data from different tables as part of its "business transaction".

Ultimately this all a variation on the "row by row" processing mind set where all the raw data is pulled back into the client, which further processes it such as joining together data from multiple tables. And a "row set" based approach can be more efficient - executing fewer but more complex SQL statements on the database server, to send only the final resultant data back to the client.

I have seen this kind of scenario in action at a number of customers, where it was their application design that was slowing down and limiting the performance of their application. In some cases we were able to identify the highest use cases in the application and successfully modify it to use fewer, more complex SQL queries instead, so that more work is done per execution by far fewer SQL statements. In other cases it was not as clear due to the application developers using a "glue layer" of third party software to handle the application to database mappings and constructing the SQL queries to be executed in each case. Fixing this requires reconfiguration of the "glue layer" to issue more complex SQL queries involving multi-table joins. However, in at least one case the customer did not at that time have enough expertise in how to use this "glue layer" to achieve that. They had just used it in an "out of the box" type fashion, and were now suffering as a consequence of that.

Probably the worst case I saw was an application using another "glue layer" that only ever issued SQL queries for a single column of data at a time of the form "select one-column from table-name where pkid_col = value", and consequently did multiple such SQL queries to get each individual column it needed. And obviously this led to horrible performance and scalability. Luckily the application had been written to use a well defined internal API for data access, which was in turn wrapped around the "glue layer". We were able to insert an extra layer between the application's API and the "glue software", intercepting the application API data access calls before the "glue software" was called. We implemented a caching layer here, based on the primary key of a record from each table, which was always used on each query. For a new record query, with a different primary key value, the API now did a "select *" into a local record buffer of the whole data record, and then returned the one individual column requested. If the primary key requested matched that already in the current record buffer, then the code just returned the individual column value directly from that, avoiding any further SQL execution and executing any code in the "glue layer".

As you might expect performance increased significantly as a result of this change - replacing the multiple single column SQL queries with just one whole record SQL query followed by in-memory, local accesses, avoiding all those network round trips. Also the net load on the Oracle database server decreased significantly, with less contention on the library cache, and the CPU and disk utilisation decreased significantly as well. Consequently the performance of all the other SQL statements being executed on that database server improved as well due to those resources being freed up. A definite "win-win" all around, by accessing the data in the database in a much more efficient way.

Thursday, 16 November 2017

Redundant Grandparent Foreign Keys and Cardinality Estimate Errors

This post is about how a slightly de-normalized database design involving redundant foreign keys to other tables can end up producing sub-optimal execution plans for queries that use those extra joins as additional filter conditions.

By default the Oracle Optimizer assumes that different columns of data in a table are independent of each other, and that their data values are not correlated with each other in any way. When a query has filter conditions on multiple columns in a table, the Optimizer will combine their filter factors together assuming they are independent of each other i.e. unless it has explicit information to the contrary. So if each filter factor was estimated to match 1% of the rows in the table, then their combined filter factor would be to match 0.01% of the rows in the table (1% of 1%).

However, if the data in these columns is in fact correlated then this filter factor estimate will be wrong. A classic example is "month of birth" and "zodiac sign". Each has only 12 possible values, and the Optimizer will assume that there are 144 possible combinations. But in fact there are only 24 possible value combinations because each zodiac sign straddles only two months. Assuming a uniform distribution amongst these 24 possible values within the data rows in the table, then each pair of values would match 1 / 24 or 4.17% of the rows in the table. However, the Optimizer will assume that a pair of values would match 1 / 144 or only 0.69% of the rows in the table. Such misestimates can make the Optimizer choose a relatively inefficient access such as using an index instead of a full table scan, due to estimating too few rows to match the filters and a too low cost for the data access.

Often this kind of correlation between data values in columns in a table occurs naturally, and there is little you can directly do about it. The "month of birth" and "zodiac sign" is one such example. However, this kind of correlation between columns can also occur as a result of how the tables have been designed, and can result in sub-optimal execution plans being produced.

One scenario is adding a redundant column to a table as a foreign key to an indirectly related table, such as a "grandparent" table that is the parent of the table's direct parent. A specific example would be something like putting the "customer / account identifier" into the "order line item" table, as a redundant copy of that in the "order" table, along with the "order identifier". Or putting both the "product main category id" and "product sub-category id" into the "product" table, when there is a hierarchy of categories.

Again, in these cases, the Optimizer will assume that such columns are independent of each other, unless it has explicit information otherwise. The danger now is that the Optimizer will produce different execution plans for what are essentially the same queries depending on whether the redundant joins are included or not.

Lets look at a solid example. I created three tables each with a parent / child relationship between them, and more rows in the child tables than the parent tables. I won't post the SQL DDL for these as it will make the post too long, but their structure and contents should be straightforward enough, and all values are uniformly distributed with an equal number of child records per parent record. There are indexes on the primary key columns, and the individual foreign key columns, but nothing else. Primary key and foreign key constraints are explicitly specified.

The tables are Grandparent (1,000 rows), Parent (100,000 rows), and Child (10,000,000 rows) i.e. 100:1 ratios, and each has a primary key column named "pkid". The Parent table has a foreign key column (gp_id) to Grandparent, while the Child table has a foreign key to Parent (of p_id) and also a redundant foreign key to Grandparent (of gp_id).

I will now execute three different but equivalent queries against these tables using different combinations of the possible foreign key joins:
  1. Only direct joins from child to parent, and then parent to grandparent
  2. As previous query plus additional (redundant) join from child to grandparent
  3. Direct from child to grandparent only, and parent table omitted from query
Each query has a filter condition on another column in the grandparent table restricting the matching rows to just 5 rows in that table.

I'm using Oracle 11gR2 on Oracle Linux:
SQL> select * from v$version;

Oracle Database 11g Enterprise Edition Release - 64bit Production
PL/SQL Release - Production
CORE Production
TNS for Linux: Version - Production
NLSRTL Version - Production

First query - direct joins between parent / child tables:
SQL> @test1


SQL_ID  fddwv6gp5m26h, child number 0
select sum ( row_count   
from child c, parent p, grandparent gp
where c.p_id = p.pkid   
  and p.gp_id = gp.pkid   
  and gp.pct05 = 1

Plan hash value: 4174412392

| Id  | Operation            | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |             |       |       | 25668 (100)|          |
|   1 |  SORT AGGREGATE      |             |     1 |    25 |            |          |
|*  2 |   HASH JOIN          |             | 49591 |  1210K| 25668   (1)| 00:05:09 |
|*  3 |    HASH JOIN         |             |   500 |  8500 |   244   (1)| 00:00:03 |
|*  4 |     TABLE ACCESS FULL| GRANDPARENT |     5 |    40 |     5   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| PARENT      |   100K|   878K|   238   (1)| 00:00:03 |
|   6 |    TABLE ACCESS FULL | CHILD       |    10M|    76M| 25398   (1)| 00:05:05 |

Predicate Information (identified by operation id):
   2 - access("C"."P_ID"="P"."PKID")
   3 - access("P"."GP_ID"="GP"."PKID")
   4 - filter("GP"."PCT05"=1)
We can see that it is estimating 5 rows to be retrieved from the Grandparent table, 500 from the join to Parent, and about 50,000 from the join to Child - operations 4, 3, and 2. This is correct given the 100:1 ratio between the rows in each table. The cost is estimated at just over 25,000, which is really just the sum of the costs of the full table scans involved.

Second query - redundant join added between Child and Grandparent:
SQL> @test2


SQL_ID  0a703yvda9z4h, child number 0
select sum ( row_count 
from child c, parent p, grandparent gp  
where c.p_id = p.pkid   
  and p.gp_id = gp.pkid   
  and gp.pct05 = 1
  and c.gp_id = gp.pkid

Plan hash value: 4140991566

| Id  | Operation                          | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT                   |               |       |       | 12271 (100)|          |
|   1 |  SORT AGGREGATE                    |               |     1 |    29 |            |          |
|   2 |   NESTED LOOPS                     |               |    50 |  1450 | 12271   (1)| 00:02:28 |
|   3 |    NESTED LOOPS                    |               | 49500 |  1450 | 12271   (1)| 00:02:28 |
|*  4 |     HASH JOIN                      |               |   500 |  8500 |   244   (1)| 00:00:03 |
|*  5 |      TABLE ACCESS FULL             | GRANDPARENT   |     5 |    40 |     5   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL             | PARENT        |   100K|   878K|   238   (1)| 00:00:03 |
|   7 |     BITMAP CONVERSION TO ROWIDS    |               |       |       |            |          |
|   8 |      BITMAP AND                    |               |       |       |            |          |
|   9 |       BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
|* 10 |        INDEX RANGE SCAN            | IX_CHILD_PID  |    99 |       |     2   (0)| 00:00:01 |
|  11 |       BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
|* 12 |        INDEX RANGE SCAN            | IX_CHILD_GPID |    99 |       |    21   (0)| 00:00:01 |
|  13 |    TABLE ACCESS BY INDEX ROWID     | CHILD         |     1 |    12 | 12271   (1)| 00:02:28 |

Predicate Information (identified by operation id):
   4 - access("P"."GP_ID"="GP"."PKID")
   5 - filter("GP"."PCT05"=1)
  10 - access("C"."P_ID"="P"."PKID")
  12 - access("C"."GP_ID"="GP"."PKID")
The only difference to the previous query is the addition of the extra filter condition of "c.gp_id = gp.pkid", using the redundant join between Child and Grandparent. The row count is the same - 50,000 - because the extra filter condition is redundant and doesn't change the number of matching rows in any way. But the execution plan is completely different, because the Optimizer has assumed that the two filters on Child are independent of each other, but this is not true.

The execution plan starts the same, filtering on Grandparent to 5 estimated rows, then joining to Parent to produce 500 estimated rows (operation 4 Hash Join). Now it does 2 Nested Loops - the inner to get ROWID's from Child of matching rows, and the outer to get the data row itself for the "one" column used in the output. And this outermost Nested Loop (operation 2) is only estimated to produce 50 rows, which is clearly incorrect.

What has happened is that because the "gp_id" column in Child has 1,000 distinct values in it, the Optimizer has reduced the final row estimate by this ratio i.e. it is estimating 50 matching rows and not 50,000 matching rows. Based on this it has costed the index based access to Child to come out at 12,271 which is lower than the 25,000+ cost of the full table scan in the first execution plan.

This cost seems to be arrived at as the cost of index access for one pair of values - 21 + 2 = 23 - plus one more for the access to the data row in the table by ROWID i.e. 24 cost per Child row. For the estimated 50 Child rows this gives a total cost of 12,000, which with the costs so far to the Hash Join of 244 are very close to the reported total cost of 12,271.

In fact this execution plan will take longer to execute because it will actually be processing 50,000 Child records and not just 50, and the execution plan of the first query would actually be the better one. But it is the assumption that these two foreign key columns are independent of each other that has led the Optimizer to produce this sub-optimal plan.

Third query - remove Parent table completely from query, and join directly from Child to Grandparent:
SQL> @test4


SQL_ID  76tptvjkbx08x, child number 0
select sum ( row_count   
  from child c, grandparent gp
 where c.gp_id = gp.pkid
   and gp.pct05 = 1

Plan hash value: 2796906588

| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |             |       |       | 25435 (100)|          |
|   1 |  SORT AGGREGATE     |             |     1 |    15 |            |          |
|*  2 |   HASH JOIN         |             | 50000 |   732K| 25435   (1)| 00:05:06 |
|*  3 |    TABLE ACCESS FULL| GRANDPARENT |     5 |    40 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| CHILD       |    10M|    66M| 25403   (1)| 00:05:05 |

Predicate Information (identified by operation id):
   2 - access("C"."GP_ID"="GP"."PKID")
   3 - filter("GP"."PCT05"=1)
The result is the same as before - 50,000 rows - and the execution plan is basically the first one with Parent removed - two full table scans and a Hash Join. With only one filter condition on the Child table the Optimizer has correctly estimated the number of matching rows - 50,000 - and produced the appropriate execution plan.


Beware of database designs that include redundant foreign key columns between child and grandparent tables, or across even more levels of a relationship hierarchy, and how you write queries involving such tables. While it might seem somehow better to include all known joins between all the tables in a query, this can cause the Optimizer to misestimate the number of matching rows from a table, and to choose non-optimal access methods as a result. Rather than helping the Optimizer, adding such redundant join conditions to queries actually makes things worse. As a general rule you are better off only having joins in a query between directly related tables, and not to indirectly related ones. Otherwise, eliminate the middle, intermediate tables if possible and join directly between the relevant tables, which should produce the same results.

I saw this happen recently at a client, and removing such a redundant join condition led to the Optimizer producing more realistic row estimates and a much better execution plan, with a corresponding faster execution given the data volumes involved.

The other approach to dealing with this kind of scenario is to provide extra information to the Optimizer so that it knows when two or more columns are correlated. This can be done simply by creating an index on just those columns in the table, or by creating extended statistics on the column group. As long as this extra information is present, then the Optimizer will produce a better row estimate and a better execution plan. When I tested this with such an index on the Child table on both "p_id" and "gp_id" together, the second query with the redundant join the execution plan went back to the original one - 3 full table scans and 2 hash joins.

Also be aware that Oracle is always trying to make the Optimizer more "intelligent" so the behaviour and execution plans produced can change between versions of Oracle for the same query on the same database tables. I briefly tested these same queries on Oracle 12c and got completely different execution plans. In 12c I actually got an Adaptive Execution Plan which combined both plan variations under a parent Statistics Collector operation. Depending on how many matching rows were initially retrieved it could either do a full table scan on Parent and Child with Hash Joins, or use the indexes on the foreign key columns to do direct row lookups in Nested Loops. Combined with other feedback mechanisms in the 12c Optimizer, it is possible for the Optimizer to now detect such non-optimal execution plans and produce better ones on the next execution of the same query.

Tuesday, 31 October 2017

I'm back, again, late 2017

Nothing much to say other then I'm back again, and hope to do some more blog posting soon. My excuse for the lack of posts is that I've been busy helping a client with a large data migration project. That is now over, with all the data successfully extracted for loading into their new system, so I've got more time available to properly write up some technical Oracle or performance posts.

Thursday, 29 December 2016

Fixing Popular Posts Margin on Blogger

I use the Blogger platform for this blog, and I recently added the "Popular Posts" widget to the sidebar. Unfortunately it did not display correctly, with the first character or two of each blog post title being lost and chopped off, as if the whole thing had been shifted to the left for some reason. Here is how I fixed it to display properly.

After a lot of reading up on HTML and CSS I realised that these posts appeared within a section with an associated style name, in this case a class. In turn I could add an entry to the HTML template for my blog to define a shift to the right for this particular class style so that it would display properly. And this shift would only be applied to this section of the blog page and no other.

What this boils down to is the following:
  • The "Popular Posts" widget reference in the HTML has an attribute of "id='PopularPosts1'"
  • The list of posts within this widget has two class attributes, one of which is "popular-posts"
  • I could shift the list of posts far enough across to the right using the style property of "padding-left"
To apply this to your Blogger blog template do the following:
  • In Blogger click on "Template" on the left hand side of your blog dashboard (under Layout and above Settings)
  • Click the on "Edit HTML" button under the small image of your blog
  • Expand the "
    " section near the top by clicking on it or on the right pointing triangle on the left hand side
    • This was at line 7 on my template
  • Scroll down to the end of this section, which finishes with
  • Above this on their own lines add the following:
#PopularPosts1 .popular-posts {
 padding-left: 15px
  • Then click on the "Save Template" button at the top of the screen
  • If you now redisplay your blog you should see the "Popular Posts" list has now been shifted over to the right and aligned nicely under the "Popular Posts" heading
Note the following:
  • The first line states that this style is only to be used within a "popular-posts" class element that itself occurs within a "PopularPosts1" ID element.
    • This should match exactly the content section we want to be shifted across in the blog page
  • The second line sets the left hand side padding before the displayed content to be "15px" (pixels)
    • The value of "15px" was obtained by simple trial and error starting with smaller values until the list was shifted over enough.
I'm not saying my solution is perfect or absolutely correct in any way, I'm just saying that it works for me, and it seems to conform to the various ways CSS works and how Blogger defines its page sections. It would seem that all of the different sidebar elements have a right shift built into them, except for Popular Posts of just titles (no snippets or thumbnails). This solution adds in such a right shift so the blog post titles are now all visible in the blog page.

Monday, 12 December 2016

Announcing Bottleneck Data Solutions

SPOILER ALERT: This particular blog post is a blatant self promotion for me and the various Oracle database services I am offering.

I may not have mentioned it explicitly before but I have left the corporate world of permanent employment and have gone independent to offer my services direct to clients (at reasonable rates, of course). I'm doing this through my newly created company Bottleneck Data Solutions, which has its own associated, obligatory web site.

Basically I am offering my many years of knowledge, experience and expertise on Oracle database design and performance tuning directly to clients for short term consulting engagements or longer term contracts. I'm not big on doing the self promotion thing unfortunately, but I won't succeed unless I get the message out and publicise myself one way or another. So this is a one-off post to raise awareness of what I'm offering through Bottleneck Data Solutions and make you aware that I'm available for Oracle database performance tuning, database design and development engagements.

I have also uploaded some of the SQL Scripts that I commonly use, to share with everyone. Nothing revolutionary I am afraid, just the standard "free space" and "system activity summary" type reports.

At the moment I am offering to do a free, initial Performance Review of an AWR Report. If I can spot anything obvious affecting performance in the AWR Report then I'll provide feedback to you.

Why "Bottleneck Data Solutions"? Well data is everywhere, and is the lifeblood of the modern company. But I have seen a significant number of cases where slow application performance was caused by a badly designed and implemented database. In other words, the way the data was organized and stored in the database was becoming the bottleneck, limiting how quickly data could be supplied to the application software. Solving this kind of bottleneck requires a mix of short term performance tuning to deal with the most obvious hot spots, and a longer term review of the database and redesigning the most critical parts.

Thank you for your patience. The normal service of technical blog posts will be resumed again in the future.

Thursday, 1 December 2016

When 2 Queries Are Better Than 1

A general rule of thumb with queries in Oracle is to use just one query when you can and leave it to the Optimizer to work out the best possible execution plan i.e. decomposing one query into multiple separate queries can end up making the database do more work than it needs to. I came across a case the past week where the opposite was true - making Oracle do separate "smaller" queries was far quicker than using one overall query.


Imagine that we have two tables - new data arrives in one table (INCOMING) and we process it in some way and insert the resultant data into another table (OUTGOING). Data records in both tables have a timestamp on them for when they were created, and they are inserted in timestamp order. We want to process new messages at regular intervals adding them to the OUTGOING table, but they are not deleted from the INCOMING table when processed. Instead a separate purge job runs each night to delete incoming data older than 48 hours. In this scenario we can use the timestamp on each record to work out the maximum timestamp of the OUTGOING data records we have processed so far, and only retrieve INCOMING data records that have been added since then.
select max (ts) from outgoing ;
An additional complexity in this is that the INCOMING data records fall into different categories or types, and these are processed separately to each other due to different processing rules and requirements. This means that when getting the maximum timestamp from the OUTGOING data set we also need to restrict on the category.
select max (ts) from outgoing 
where category = 'CATEGORY1' ;
But it also turns out that some categories are similar enough to each other that they can be processed together at the same time from INCOMING, which should be more efficient that executing multiple separate queries against what can be a large input table with historical data in it. This means we now need the maximum timestamp from OUTGOING across a number of categories. This is most obviously done with this query.
select max (ts) from outgoing 
where category in ( 'CATEGORY1' , 'CATEGORY2', 'CATEGORY3' ) ;
This is valid because these 3 categories are all extracted from INCOMING at the same time and processed together, so we do want the maximum timestamp across all of them together.

The Problem

When executed this "SELECT MAX (TS) WHERE CATEGORY IN (...)" query took a long time to execute (tens of seconds). Previous tests had shown that Oracle can efficiently use an index to get a minimum or maximum value from a table (sub-second), so why was this particular query taking so much longer?

The conclusion was that a single category maximum or minimum is executed differently to one on a group of categories, and this was the cause of the longer execution time. Furthermore, rewriting the query as a union of separate single category queries was much faster (over 100 times faster in this case!) and almost as fast as the original single category query.

Test Environment

This happened for real on Oracle 12c, on a non-CDB database. I'm going to replicate it on a separate test environment, also on Oracle 12c on a non-CDB database, running on Oracle Linux 7.
SQL> select * from v$version ;
BANNER                                                                           CON_ID
---------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release - 64bit Production          0
PL/SQL Release - Production                                                0
CORE        Production                                              0
TNS for Linux: Version - Production                                        0
NLSRTL Version - Production                                                0
I have taken a copy of the data in DBA_OBJECTS which in my test instance had over 90,000 rows in it occupying 12 MB of disk storage, and then doubled this 9 times to be 512 times larger than the original data set, giving over 45 million rows using over 5 GB of disk storage.

Initially I thought about using the CREATED column for the timestamp, and the OBJECT_TYPE for the category, but this ran into problems. First is that the doubling the rows each time does not change any data values, which means that we have very few different, distinct values for CREATED and each value repeats many times. This has an impact on calculations the Optimizer makes and the costs it estimates. Also the real data type in the real system was TIMESTAMP and not DATE. So I decided to add an extra column TS to the test table, and to update this later on to be a near-unique value. By giving the column an initial value it means that space is allocated within each record in each block in Oracle, and the later update should not cause any more space to be allocated or rows to be moved between blocks.

My test system has only 2 GB for UNDO, which means that I cannot do an UPDATE on a table larger than this. So what I have done is double the table 7 times to just over 1 GB in size, do the UPDATE on the TS to a near-unique random value, then double the table another 2 times. This means that each TS value occurs 4 times, which I hope will still be good enough for the tests I am doing.

In terms of the spread of values, in the real system the incoming data is purged every day, deleting data records older than 48 hours. The UPDATE is using the current system timestamp value, offset by up to 48 hours as a value in seconds.
create table outgoing
tablespace testdata
as select * from dba_objects ;
alter table outgoing modify (owner not null, object_name not null,
object_id not null, object_type not null, created not null) ;
-- Try and make the data loading go faster by minimising redo logging
alter table outgoing nologging ;
alter table outgoing add (ts timestamp default systimestamp not null) ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
update outgoing
set ts = systimestamp - numtodsinterval (dbms_random.value (0, 60*60*24*2), 'second') ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
insert /*+ append */ into outgoing select * from outgoing ;
commit ;
select count (*) cnt , count (distinct ts) ts_cnt from outgoing ;
select bytes / (1024 * 1024) mb
from user_segments
where segment_name = 'OUTGOING' ;
This produced 46,384,640 rows occupying 6.3 GB of disk storage. I also checked the minimum and maximum values of the timestamp column, and they did cover the past 48 hours (29-NOV-16 to 01-DEC-16

Test Queries & Execution Plans

In the real system there are less than 10 categories, and the processing involves picking some of the ones that occur the most frequently. I'll use the OBJECT_TYPE for the category, and pick the top 3 most frequently occurring values for the test queries:
select object_type, count (*)
from outgoing
group by object_type order by 2, 1 ;

OBJECT_TYPE               COUNT(*)
----------------------- ----------
TABLE                      1217536
TYPE                       1296384
INDEX                      2208256
VIEW                       3252224
JAVA CLASS                15777792
SYNONYM                   18917376
So our test categories will be the bottom 3.

Let's create an index on the category and timestamp columns, again just like the real system:
create index ix_outgoing on outgoing (object_type, ts)
tablespace testdata ;
And let's make sure our statistics on this table are up to date:
exec dbms_stats.gather_table_stats ('JOHN', 'OUTGOING')
When I ran these test queries and then displayed the execution plans the costs shown seemed wrong for the second query of multiple categories. The reported cost was too low given the much higher elapsed time and what I knew it was doing during the execution. To show the real elapsed time I have done "set timing on" in SQL*Plus, and I have run the queries with the "gather_plan_statistics" hint, and used extra options to get the run time execution statistics for the query execution. This will show more about what really happened during the execution.

And to try and make things more even and comparable between the queries I also flushed the buffer cache and shared pool between the execution of each query (done from another SQL*Plus session connected as SYSDBA).

A single maximum for one query has the following execution plan:
SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing where object_type = 'SYNONYM' ;


Elapsed: 00:00:00.14

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID 5mxhp3mh3gzmj, child number 0
select /*+ gather_plan_statistics */ max (ts) from outgoing 
where object_type = 'SYNONYM'

Plan hash value: 856418741

| Id  | Operation                    | Name        | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT             |             |      1 |        |     4 (100)|          |      1 |00:00:00.01 |       4 |      4 |
|   1 |  SORT AGGREGATE              |             |      1 |      1 |            |          |      1 |00:00:00.01 |       4 |      4 |
|   2 |   FIRST ROW                  |             |      1 |      1 |     4   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |

Predicate Information (identified by operation id):
   3 - access("OBJECT_TYPE"='SYNONYM')

The multi-category query has the following execution plan:
SQL> select /*+ gather_plan_statistics */ max (ts) from outgoing 
     where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' ) ;


Elapsed: 00:00:36.98

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID anb3s44frw6xa, child number 0
select /*+ gather_plan_statistics */ max (ts) from outgoing 
where object_type in ( 'SYNONYM' , 'JAVA CLASS' , 'VIEW' )

Plan hash value: 2754621723

| Id  | Operation          | Name        | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows | A-Time     | Buffers | Reads  |
|   0 | SELECT STATEMENT   |             |      1 |        |     4 (100)|          |      1 |00:00:36.87 |     169K|    169K|
|   1 |  SORT AGGREGATE    |             |      1 |      1 |            |          |      1 |00:00:36.87 |     169K|    169K|
|   2 |   INLIST ITERATOR  |             |      1 |        |            |          |     37M|00:02:23.89 |     169K|    169K|
|*  3 |    INDEX RANGE SCAN| IX_OUTGOING |      3 |      1 |     4   (0)| 00:00:01 |     37M|00:01:02.14 |     169K|    169K|

Predicate Information (identified by operation id):

The rephrased query using 3 separate per-category queries is:
SQL> select /*+ gather_plan_statistics */ max (maxts) from (
     select max (ts) maxts from outgoing where object_type = 'SYNONYM'
     union all
     select max (ts) maxts from outgoing where object_type = 'JAVA CLASS'
     union all
     select max (ts) maxts from outgoing where object_type = 'VIEW'
     ) ;


Elapsed: 00:00:00.15

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL -BYTES IOSTATS LAST'));

SQL_ID 1yv5abfvs46y7, child number 0
select /*+ gather_plan_statistics */ max (maxts) from ( 
select max (ts) maxts from outgoing where object_type = 'SYNONYM' 
union all select max (ts) maxts from outgoing where object_type = 'JAVA CLASS' 
union all select max (ts) maxts from outgoing where object_type = 'VIEW' )

Plan hash value: 2647832233

| Id  | Operation                       | Name        | Starts | E-Rows | Cost(%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
|   0 | SELECT STATEMENT                |             |      1 |        |    12(100)|          |      1 |00:00:00.04 |      12 |      9 |
|   1 |  SORT AGGREGATE                 |             |      1 |      1 |           |          |      1 |00:00:00.04 |      12 |      9 |
|   2 |   VIEW                          |             |      1 |      3 |    12  (0)| 00:00:01 |      3 |00:00:00.04 |      12 |      9 |
|   3 |    UNION-ALL                    |             |      1 |        |           |          |      3 |00:00:00.04 |      12 |      9 |
|   4 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.01 |       4 |      4 |
|   5 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|*  6 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      4 |
|   7 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.01 |       4 |      3 |
|   8 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      3 |
|*  9 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      3 |
|  10 |     SORT AGGREGATE              |             |      1 |      1 |           |          |      1 |00:00:00.02 |       4 |      2 |
|  11 |      FIRST ROW                  |             |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.02 |       4 |      2 |
|* 12 |       INDEX RANGE SCAN (MIN/MAX)| IX_OUTGOING |      1 |      1 |     4  (0)| 00:00:01 |      1 |00:00:00.02 |       4 |      2 |

Predicate Information (identified by operation id):
   6 - access("OBJECT_TYPE"='SYNONYM')
   9 - access("OBJECT_TYPE"='JAVA CLASS')
  12 - access("OBJECT_TYPE"='VIEW')


The single category query executes in under 1 second (0.14 seconds on my test system), while the multiple category query using "IN" takes over 30 seconds (36.98 seconds) which is much slower (over 100 times slower for elapsed time).

We can see that the single category query uses an optimized index access to just get the maximum timestamp value - "INDEX RANGE SCAN (MIN/MAX)" combined with a "FIRST ROW" operation in the execution plan. However the multiple category query using "IN" does not do this and instead does an "INDEX RANGE SCAN" combined with a "INLIST ITERATOR" which takes far longer to execute (as seen from the A-Time value in the reported execution plan and the 37M for A-Rows).

The solution is to rewrite the query as a "UNION ALL" of 3 separate single category queries. We can safely use "UNION ALL" because the categories are distinct from each other. This query executes in under a second which is almost as fast as for just one category, and uses the optimized "INDEX RANGE SCAN (MIN/MAX)" access method combined with the "FIRST ROW" operation.

Sometimes it can be quicker to run separate SELECT queries and merge the results together than to run one single query that does it all at once. Sometimes it can even be 100 times faster to run separate queries than one single merged query.

Thursday, 20 October 2016

AWR Summary Data Extracts

A long time ago (in a galaxy far away) I wrote a series of blog posts about directly extracting useful sets of data from the AWR snapshots in the DBA_HIST tables for subsequent analysis and graphing using things like Excel. This post is to summarise where I got to on this, and provide links back to the main posts for future reference.

A guiding principle of my approach was that I wanted to avoid the "mother of all AWR queries" that would extract every possible data value from all DBA_HIST tables for all snapshots, as it would be a really big and complex query (meaning difficult to debug if a mistake was introduced anywhere) and it could use up a lot of system resources and take some time to finish each time it was executed. Instead I prefer the smaller, targeted type of query that extracts just what you need from the minimum number of AWR tables, because it is simpler to understand and will be quicker to execute. That said, you could extend my approach of one sub-query per DBA_HIST table and include all of the main AWR snapshot tables into one single query, and then create a view using it. This would allow you to write simpler analysis queries later on against the view, without needing to need to know the details of all the various sub-queries and DBA_HIST tables.

Along the way I covered the following main topics in the blog posts:-
Although I did not post a query that merged all the different component sub-queries together, I did post a minimal Summary AWR Query at the end to show how you could use the basic AWR data in practise to look at the overall performance and behaviour of a system. By extending the main SELECT of the final query itself, you can add extra columns of data you want into the output, without any changes to the various component sub-queries (the "with" or Common Table Expressions). To put it another way, if there is some extra AWR data you want extracted then you need to work out which of the main data sets it falls under, and then which columns or rows in that AWR data set the data you want is stored in.

For instance here is another variation of the AWR Summary Query that also extracts the number of SQL statements executed, the separate wait times for User I/O and Cluster waits (normalised to a percentage of the total wait time in the snapshot), physical reads (normalised to a per second value), average disk read and redo write times.
 * Name      : awr_sum_lite
 * Purpose   : Minimal one line summary of each AWR snapshot from a day
 * Descrip'n : Based on awr_sum_range with unneeded stuff deleted
 *             Only key essential stuff is output:
 *                 SQL executions / sec, #User calls / sec
 *                 Average Active Sessions, Wait % of Busy time
 *                 Cluster Wait % of wait time, User I/O % of wait time
 *                 Avg disk read time, Avg redo write time, #Disk reads / sec
 *             Values directly calculated, rather than raw underlying values
 * Usage     : @awr_sum_lite
 * Assumption: Want the last 4 whole days
 *               Which is from midnight 4 days ago to last midnight
 * To Do     : 
set feedback off
set heading off
set newpage none
set verify off
set linesize 1000
set pages 28
set trimout on
set trimspool on
select ' ' from dual ;
select '                 AWR Lite Summary Report' from dual ;
select '                 =======================' from dual ;
-- select ' ' from dual ;
-- select ''
--    || '<= Database Time =>'
--    || '<= System Statistics ==>'
--    || '<== Waits ================>'
--  from dual ;
set newpage 1
set heading on
col snap_time           format a15      heading 'SNAP TIME'
col aas                 format 90.0     heading 'AAS'
col wait_pct            format 990.0    heading 'WAIT%'
col cluster_pct         format 90.0     heading 'CLUS%'
col user_io_pct         format 90.0     heading 'UIO%'
col executions_sec      format 9,990    heading 'EXEC/S'
col user_calls_sec      format 9,990    heading 'UCALL/S'
col physical_reads_sec  format 9,990    heading 'READS/S'
col avg_disk_read       format 990.0    heading '(MS)|AV RD'
col avg_redo_write      format 990.0    heading '(MS)|REDO W'
snaps as 
(select snap_id
      , dbid
      , end_snap_time
      , snap_interval
      , extract (second from snap_interval) 
       + (extract (minute from snap_interval) 
          + (extract (hour from snap_interval)
             + (extract (day from snap_interval) * 24)
             ) * 60
          ) * 60 snap_duration
  from (select csnaps.snap_id
             , csnaps.dbid
             , min (csnaps.end_interval_time) end_snap_time
             , min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
          from dba_hist_snapshot csnaps
         group by csnaps.snap_id, csnaps.dbid
) -- snaps
, systimes as 
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
      , systime.dbid
      , systime.stat_name
      , sum (systime.value - psystime.value) value
         from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
        where systime.snap_id = psystime.snap_id + 1
          and systime.dbid = psystime.dbid
          and systime.instance_number = psystime.instance_number
          and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
        group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
      , sysstat.dbid
      , sysstat.stat_name
      , sum (sysstat.value - psysstat.value) value
         from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
        where sysstat.snap_id = psysstat.snap_id + 1
          and sysstat.dbid = psysstat.dbid
          and sysstat.instance_number = psysstat.instance_number
          and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
        group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats
, syswaits as 
-- One row for total wait time, plus break down into major wait classes, and events
(select syswaitevents.snap_id
      , syswaitevents.dbid
      , sum (syswaitevents.time_waited_micro) all_wait_time
      , sum (decode (syswaitevents.wait_class, 'Commit',      syswaitevents.time_waited_micro, 0)) commit_time
      , sum (decode (syswaitevents.wait_class, 'Cluster',     syswaitevents.time_waited_micro, 0)) cluster_time
      , sum (decode (syswaitevents.wait_class, 'Concurrency', syswaitevents.time_waited_micro, 0)) concurrency_time
      , sum (decode (syswaitevents.wait_class, 'Network',     syswaitevents.time_waited_micro, 0)) network_time
      , sum (decode (syswaitevents.wait_class, 'System I/O',  syswaitevents.time_waited_micro, 0)) system_io_time
      , sum (decode (syswaitevents.wait_class, 'User I/O',    syswaitevents.time_waited_micro, 0)) user_io_time
        (select sysevent.snap_id
              , sysevent.dbid
              , sysevent.wait_class
              , sysevent.event_name
              , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
              , sum (sysevent.total_waits - psysevent.total_waits) wait_count
           from dba_hist_system_event sysevent, dba_hist_system_event psysevent
          where sysevent.snap_id = psysevent.snap_id + 1
            and sysevent.dbid = psysevent.dbid
            and sysevent.instance_number = psysevent.instance_number
            and sysevent.event_id = psysevent.event_id
            and sysevent.wait_class != 'Idle'  -- Ignore Idle wait events
          group by sysevent.snap_id
                 , sysevent.dbid
                 , sysevent.wait_class
                 , sysevent.event_name
        ) syswaitevents
  group by syswaitevents.snap_id
         , syswaitevents.dbid
) -- syswaits
-- Average Active Sessions, Wait % of Busy time
-- Cluster Wait % of wait time, User I/O % of wait time
-- SQL executions / sec, #User calls / sec
-- Avg disk read time, Avg redo write time, #Disk reads / sec
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
     , (user_calls_st.value / snaps.snap_duration)              user_calls_sec
     , (execs.value / snaps.snap_duration)                      executions_sec
     , (dbtime.value / 1000000)       / snaps.snap_duration     aas
-- If database active time is 1% of duration time or less ignore wait (0)
     , case when (dbtime.value / (1000 * snaps.snap_duration) ) > 1
            then (100 * syswaits.all_wait_time  / dbtime.value)
            else 0.0
       end wait_pct
     , (100 * syswaits.cluster_time   / syswaits.all_wait_time) cluster_pct
     , (100 * syswaits.user_io_time   / syswaits.all_wait_time) user_io_pct
     , (phys_reads.value / snaps.snap_duration)                 physical_reads_sec
     , (syswaits.user_io_time / phys_reads.value) / 1000        avg_disk_read
     , (redo_time_st.value * 10 / redo_write_st.value)          avg_redo_write
  from snaps
     join (select * from systimes where stat_name = 'DB time') dbtime
       on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
     join syswaits
       on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
     join (select * from sysstats where stat_name = 'execute count') execs
       on snaps.snap_id = execs.snap_id and snaps.dbid    = execs.dbid
     join (select * from sysstats where stat_name = 'user calls') user_calls_st
       on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
     join (select * from sysstats where stat_name = 'redo writes') redo_write_st
       on snaps.snap_id = redo_write_st.snap_id and snaps.dbid  = redo_write_st.dbid
     join (select * from sysstats where stat_name = 'redo write time') redo_time_st
       on snaps.snap_id = redo_time_st.snap_id and snaps.dbid  = redo_time_st.dbid
     join (select * from sysstats where stat_name = 'physical reads') phys_reads
       on snaps.snap_id = phys_reads.snap_id and snaps.dbid    = phys_reads.dbid
 where snaps.end_snap_time between 
       (trunc (sysdate) - 4) and (trunc (sysdate))
 order by snaps.end_snap_time
set feedback on
set lines 80
This query provides what I believe are a good, minimal set of useful measurements of activity and relative performance of an Oracle database. It allows easy viewing of relative activity (Average Active Sessions, User Calls / sec), waiting time as a percentage of database time, the main types of wait occurring (User I/O and Cluster), and some key disk I/O measurements (physical reads / sec, average disk read time, average redo write time). From these you can see how active the system is, whether any significant waits are occurring, and if they are I/O related, and see how these change over time between AWR snapshots.

Of course everyone has a different opinion of what the most important measurements are on an Oracle database. My key message here is that the approach I have followed lets you customise these queries to report out the performance and activity measurements you want from the data captured in the AWR snapshots. So if you want a different set of data values reported out, then it should be relatively easy to modify these queries to get what you want.