In the previous part I explained the top part of the report. In this part I’ll continue with the actual information about database activity, what we are looking for and other important things we can find in the report.
The table with the wait events is one of the most important in the report. As I said before, the report shows the general activity of the database during the report time window. The wait events table shows the top events (5 or 10, depends on the version) during this time. Wait events are activity statistics of the database server processes. When a server process executes something it can either work on the CPU (like sort, access memory, etc.) or wait for other resource (such as IO, lock or something else). When the process waits for something, this wait is what we call “wait event”. In this table, we see the totals (and averages) of these wait events, plus the total of the CPU activity (which is not a wait event, but also important).
Looking at this table is the first step for understanding the database activity. Look at the CPU usage, if we have a single CPU and report of 1 hour, one hour of CPU time means we are CPU bound (as we consume all CPU and might need more). But if we see 1 minutes of CPU time, it means that the database is either idle or mostly wait on different resources. Understanding if what we see is reasonable or not requires to know the different wait events. But from Oracle 10g, we can also see the wait class, which makes it much easier.
CPU and IO is something that we usually expect to see, but again, we need to understand the system and numbers. For example, the “db file scattered read” is the wait event for multi-block IO read, which is usually full table scan. This is something that we wouldn’t want to see in OLTP systems too much, but is very expected for DWH. Another example is the “Enqueue” wait events (TX, TM, etc.). These are locks, meaning that the process was waiting for a lock to be released. This is something we probably don’t want to see, and we should talk about that with the developers.
I’ll jump to the SQL statistics part of the report, as it also contains a very important information that we should look at. I won’t cover most of the other parts of the reports, but once we suspect something, we can go to the specific part of the report that is related to this topic and investigate it.
So, the part of SQL statistics contains the top SQL that were executed in the database during the report duration in several different tables. Note that not all of the SQL statements will be here, only the top ones.
The different tables show the SQL statement ordered by different metrics like execution time, number of executions, logical reads, physical reads, parse calls and more. The statements here can relate to the wait events we see in the wait events table. For example, a small update that took a long time might explain a “TX enqueue” wait event.
Things that should raise a flag may be statements with many execution (maybe more than we would expect), statements that take longer than expected and statements that we don’t expect at all (like gather statistics in the middle of the day).
Other Parts of the Report
As I already said, the report contains many different parts on pretty much everything. Important parts that we haven’t talked about include IO statistics (which can help identify IO bottlenecks), segment statistics (or identify overloaded tables/indexes/etc) and
One last important part to check is right at the end of the report. This part contains all initialization parameters with non-default value. I sometimes see un-documented parameters that might affect the database, or some strange setting (like optimizer parameters), so it’s a good thing to glance over this part.
This is the basics of the report, there are so many different cases and scenarios, that I can’t cover them in a post. In the next part I’ll give a few examples for things I found in AWR reports in the past.