After talking a bit about the AWR report (if you haven’t read the previous posts, you can find part 1 here and part 2 here), I think one of the best ways to understand it is to talk about real examples. In this part I’ll give you a few examples and tips regarding the report.
As I said before, the fact that IO appears at the top of the wait events table does not mean that the IO is the bottleneck. So how can we know if IO is a problem? There are several things to check here, and I’ll give several examples:
Lots of IO
Many times a lot of IO is mistaken for IO issue. but the fact that we have a lot of IO doesn’t mean that there is an IO problem. I’ve seen quite a few cases where SQL tuning reduced the amount of IO and improved the overall performance of the system (and the storage load).
Another thing to check is the type of IO. For example, “Direct Path” events (reads and writes) might indicate lots of IO against the TEMP tablespace. That can be caused by having a PGA that is too small. In this case we can drill down in the AWR report to the PGA statistics to check if this is the case.
Slow IO can quite easily identified by the response time of IO operations. This info can be found in the wait event table under “average wait time” on the IO events. For more detailed and accurate information, go to the IO part of the report. In this part we will see the details per tablespace and disk. The expected numbers depends on the system and many other factors, but just as a rough number, it should not exceed about 20ms (and usually should be a single digit).
In one case a customer had a performance problem. In the AWR report I saw that the disk response time was up to 5000ms (that’s 5 seconds!). I didn’t need to look any further in the report. It was an old (and probably faulty) storage subsystem, so they replaced it and everything was perfect ever since.
Don’t replace the storage too fast
Note, even if the IO is slow, it doesn’t mean you have to go and buy a new one. First, make sure that all this IO is needed. Maybe the memory is under utilized, maybe some SQL tuning should be done. Only if the storage doesn’t keep up and the system seems to be well behaving it might be the right time to get a stronger storage system.
SQL Executions (1)
As I mentioned in the previous parts, the SQL statistics part is very important. There is something you need to remember, though. the SQL statistics parts are sorted by the total (i.e. the part “SQL statement by physical reads” will be sorted by total reads of the query, the part “SQL statement by CPU” will be sorted by total CPU of the query and so on). This is sometimes misleading. For example, let’s take two queries from the report:
Query A is executed 1 time with a total of 1M physical reads
Query B is executed 1000 with a total of 900K physical reads
In this case, query A will be higher in the table than query B. Should we handle query A first? It probably requires some tuning or attention, but it also might be a one-time thing, Query B on the other hand, runs 1000 times in the report, and performs 900 physical reads EVERY TIME (on average of course). We might actually want to handle it first.
What I’m trying to say it that the order of the queries doesn’t always reflect their importance when tuning a database. You should look at the entire table and look for the best candidates to start with.
SQL Executions (2)
Sometimes we can find problems simply by looking at the queries statistics. One time I saw a query that was executed every second or so, and looked a bit strange (it was for a customer I visited many times, so I was a little bit familiar with the application side). When I went to the developer and asked him about this query he said “oh, don’t worry about this query, it runs only once, when the application loads”. The thing is that it wasn’t, it was executed all the time. When I told him that, he found a bug in the application and fixed it. Just by looking at the statistics, something looked strange, and it was indeed a bug.
SQL Hard Parses
AWR report is all about the overall. We will only see the top SQL statements and the total activity. I’ve seen some cases where there was a stress on the shared pool because of many hard parses. In the AWR report we will might see evidence for that in the wait events (library cache latches for example) and in the “Execute to Parse” statistic in the “Instance Efficiency Percentages” part, and more. But remember, you will not see the queries that cause that. Many hard parses mean that we have many SQL statements with literals instead of bind variables. These queries will be executed only once and therefore will not appear in the report, even though they are the root cause of the problem.
In order to identify these queries, we will need to query the library cache directly or generate SQL traces.
That’s it for this post and series. I hope it will help you when you analyze your next AWR report. Feel free to comment or ask questions, maybe it will lead to another post in this series.