Reading AWR Report – Part 3

After talking a bit about the AWR report (if you haven’t read the previous posts, you can find part 1 here and part 2 here), I think one of the best ways to understand it is to talk about real examples. In this part I’ll give you a few examples and tips regarding the report.

IO Issues

As I said before, the fact that IO appears at the top of the wait events table does not mean that the IO is the bottleneck. So how can we know if IO is a problem? There are several things to check here, and I’ll give several examples:

Lots of IO

Many times a lot of IO is mistaken for IO issue. but the fact that we have a lot of IO doesn’t mean that there is an IO problem. I’ve seen quite a few cases where SQL tuning reduced the amount of IO and improved the overall performance of the system (and the storage load).

Another thing to check is the type of IO. For example, “Direct Path” events (reads and writes) might indicate lots of IO against the TEMP tablespace. That can be caused by having a PGA that is too small. In this case we can drill down in the AWR report to the PGA statistics to check if this is the case.

Slow IO

Slow IO can quite easily identified by the response time of IO operations. This info can be found in the wait event table under “average wait time” on the IO events. For more detailed and accurate information, go to the IO part of the report. In this part we will see the details per tablespace and disk. The expected numbers depends on the system and many other factors, but just as a rough number, it should not exceed about 20ms (and usually should be a single digit).

In one case a customer had a performance problem. In the AWR report I saw that the disk response time was up to 5000ms (that’s 5 seconds!). I didn’t need to look any further in the report. It was an old (and probably faulty) storage subsystem, so they replaced it and everything was perfect ever since.

Don’t replace the storage too fast

Note, even if the IO is slow, it doesn’t mean you have to go and buy a new one. First, make sure that all this IO is needed. Maybe the memory is under utilized, maybe some SQL tuning should be done. Only if the storage doesn’t keep up and the system seems to be well behaving it might be the right time to get a stronger storage system.

SQL Executions (1)

As I mentioned in the previous parts, the SQL statistics part is very important. There is something you need to remember, though. the SQL statistics parts are sorted by the total (i.e. the part “SQL statement by physical reads” will be sorted by total reads of the query, the part “SQL statement by CPU” will be sorted by total CPU of the query and so on). This is sometimes misleading. For example, let’s take two queries from the report:

Query A is executed 1 time with a total of 1M physical reads

Query B is executed 1000 with a total of 900K physical reads

In this case, query A will be higher in the table than query B. Should we handle query A first? It probably requires some tuning or attention, but it also might be a one-time thing, Query B on the other hand, runs 1000 times in the report, and performs 900 physical reads EVERY TIME (on average of course). We might actually want to handle it first.

What I’m trying to say it that the order of the queries doesn’t always reflect their importance when tuning a database. You should look at the entire table and look for the best candidates to start with.

SQL Executions (2)

Sometimes we can find problems simply by looking at the queries statistics. One time I saw a query that was executed every second or so, and looked a bit strange (it was for a customer I visited many times, so I was a little bit familiar with the application side). When I went to the developer and asked him about this query he said “oh, don’t worry about this query, it runs only once, when the application loads”. The thing is that it wasn’t, it was executed all the time. When I told him that, he found a bug in the application and fixed it. Just by looking at the statistics, something looked strange, and it was indeed a bug.

SQL Hard Parses

AWR report is all about the overall. We will only see the top SQL statements and the total activity. I’ve seen some cases where there was a stress on the shared pool because of many hard parses. In the AWR report we will might see evidence for that in the wait events (library cache latches for example) and in the “Execute to Parse” statistic in the “Instance Efficiency Percentages” part, and more. But remember, you will not see the queries that cause that. Many hard parses mean that we have many SQL statements with literals instead of bind variables. These queries will be executed only once and therefore will not appear in the report, even though they are the root cause of the problem.

In order to identify these queries, we will need to query the library cache directly or generate SQL traces.

Summary

That’s it for this post and series. I hope it will help you when you analyze your next AWR report. Feel free to comment or ask questions, maybe it will lead to another post in this series.

Advertisements

8 thoughts on “Reading AWR Report – Part 3

  1. Thanks sir for such good series.but how we can determined the statistics(number) are good or bad in Awr or how we can understand and firmly explain to a client that this report is good is any way to do calculation of this statistics so as a dba I can understand the flow and health of the database .Agian thanks sir

    Like

    • Thanks for your comment. It’s a very difficult question as there is no a real good answer. It’s like asking: “Is 5 seconds for a query considered good performance?”. I can’t answer that as it depends on so many factors.
      With AWR it’s the same. Generally, if the db time is low (1 hour report and 10 minutes db time, for example), I would guess that it’s OK, but it doesn’t have to be. If the top event is “enqueue”, it might mean that they have a locking problem (or a strange design). If it’s IO, it might be OK.
      The best option, in my opinion, is to compare it to another AWR from a different time. That way you can see the differences.
      But if I just get an AWR and I don’t see anything that looks strange, without some input from the client, or knowing the system, it’s really hard to know.
      Hope it helps.
      Liron

      Like

  2. Hi,

    Suppose you find the average read time of the IO wait events to be more than 20ms which section of AWR would you examine to find out the culprit queries. Is that sql ordered by physical reads or sql ordered by gets. Thanks in advance for your time.

    Regards,
    Vish

    Like

    • Hi Vish,
      The “gets” part of the SQL is for “buffer gets” meaning blocks from memory, the “reads” means “physical reads”. So “reads” is the one that interests you. However, it is sorted by amount of reads, not time. So there might be query there that reads a lot of data, but it does that fast (from fast disks for example), while another query reads very few blocks but slowly. It might help in some cases (for example, if the IO is overloaded, top queries by “reads” might be the cause for that), but it’s not necessarily related.
      Liron

      Like

      • Thanks for your reply Liron i have little more doubt here how can we know if the IO substem is overloaded or the the IO subsytem itself is slow. In other words how do we decide if we are doing too many reads and that is causing the IO subsystem to be slow or our IO subsystem is itself painfully slow. My second question is every logical get requires physical IO at first so is it not better for us to aim for sql ordered by gets and tune them to reduce the IO bottleneck rather than concentrating on sql ordered by physical reads.

        Regards,
        Vish

        Like

  3. Vish,
    It’s difficult to know if you’re overloading the IO, it depends on the IO subsystem itself. I would consult with the storage people about that.
    Regarding the physical vs logical gets, some queries always read from the disk, like full table scans and parallel queries that use direct IO. Also, if you have a rare query that scan large table that are not in memory, they can overload the IO by themselves and they will appear in the physical read section, not necessarily in the logical read one.
    Liron

    Like

    • Thank you so much Liron for your time. Waiting for your next article in this series i am sure all these articles are useful for all the newbies like me.

      Regards,
      Vish

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s