Correlation reports for Statspack and AWR data

The upcoming now available version of Mumbai will adds correlation reports for Statspack and AWR data.

Here is how it works: Once you have one or more SP/AWR reports for “Instance activity”, “Wait events” or “Top SQL” open in Mumbai, you can click on “Correlate by time”.

Here’s an example: In the following screenshot you see an SP Wait events diagram which shows an interval of high wait times for wait event “SQL*Net more data from DB link”. Also notice that there is already a SP Top SQL report open.

With the data for “Wait events” and “Top SQL” already loaded in the background, we can now click on “Correlate by time”.

What we get then is a new window which shows two columns with identifiers for data series for wait events, SQL_IDs or system statistics (depending on what kind of reports you had open) and a third column showing a correlation value within -100% and +100%.

A high correlation value means that the two data series have a strong correlation over time. A high negative value means that when the first series has high values, the second series has low values and vice versa.

You can browse the correlation data in this table, but the significant correlation values for a data series are also shown as a hint when you hover the mouse over the graph for a series.

Going back to our example above, it looks like this:

This shows that there is a correlation (over time) between the “SQL*Net more data from DB link” wait event and the SQL statement with the SQL_ID “f410crm1ck46k”.

(EDIT 1:) If we select this statement on the “Elapsed time” tab of the “Top SQL” page, we can visually confirm that there is a correlation over time between the wait event and the elapsed time for this SQL_ID.

Here’s another example of one SQL statement with a correlating value to another SQL statement:

This seems to indicate that the SQL with SQL_ID “cj5ywur2kvhj2” is called from within the parent SQL call for that the graph is shown above. You can see this kind of correlation frequently with PL/SQL blocks and scheduler jobs.

I guess these example show what the correlation report can do for you. I hope you find it useful.

Some additional notes:

  • The correlation report is time based, what means that there is not necessarily a logical correlation between two series. It could be just coincidence.
  • The correlations are calculated not by the absolute values for the data series, but on the relative series values compared to the maximum value of the data series. This is the reason why graphs for series with very high absolute values can have a high correlation value with graphs that have low absolute values. (EDIT 3: As Sergei pointed out in the comments below, it actually doesn’t matter if the correlation is calculated based on absolute or normalized values.)
  • You might not see all correlations that actually exist. For example, if one execution of an SQL statement spans multiple snapshots, the wait events produced by this SQL will show up in all of these snapshots. However, the SQL execution times in the Top SQL report will only show up in the snapshot when the execution finished. Though there is a logical correlation between the SQL execution time and the wait events triggered by the execution of the SQL, the correlation report can’t find it, because the correlation calculation is snapshot based.
  • (EDIT 2:) Special thanks and greetings to Yuri van Buren! His “Mining for gold in the AWR” session at the DOAG conference and our conversations inspired me to build the correlation report.
This entry was posted in Mumbai. Bookmark the permalink.

8 Responses to Correlation reports for Statspack and AWR data

  1. Does it mean if:
    Snapshot 1: Statistic A = 1; Statistic B = 100;
    Snapshot 2: Statistic A = 2; Statistic B = 200;
    Then there will be 100% correlation between A and B? If not, can you give your example?

    • Exactly.

      I added another screenshot above that shows the elapsed time graph for the SQL statement that correlates to the “SQL*Net more data from DB link” wait event. As you see there the absolute values for the two data series are far apart from another, but compared to their maximum value the two series do correlate over time.

    • I’m using the deltas between all snapshots (1 to 2, 2 to 3 ,3 to 4, etc.), not just the delta between first and the last.
      The later would not work, since the values have to be normalized to the maximum value of the series in the complete interval. If you had only one delta per series, this delta would also be the maximum value and all series would have a correlation of 100% to all other series.

      • OK, I didn’t realize it was about time series correlation. Why do you need to normalize data? Are you calculating modified covariance/correlation between two series? The standard correlation should not be sensitive to the normalization.

    • I am calculating standard correlation, but I am feeding it with normalized data which already was available in my queries for Top SQL, wait events and system statistics.. You are right – I could have taken the absolute values as well.

      • I have a tool (Lab128) which collects about 3000 Oracle statistics with good temporal resolution (every 6 secs). This is independent of AWR. Top SQL stats also included (default resolution 15 secs). So I was thinking for quite some time about automatic detection of dependencies between all those stats. Although I was not sure how productive this approach could be in everyday typical troubleshooting. How useful this feature has been for you? Does it provide clues beyond obvious things, which will make someone use it over and over again?

  2. I know your tool and actually quite a few people at the DOAG conference asked me if I knew it. 🙂

    I can’t say if it is useful for the data you gather and how you gather it (probably realtime) in your tool. You should know! 😉

    The DOAG session that made me implement this approach was titled “Mining for gold in AWR” and just like gold mining you probably seldomly find some, but if you find some it feels really good.
    It’s the same with the correlation report. Sometimes there is really no useful correlation, sometimes there is the obvious, sometimes it simply let’s you understand PL/SQL call hierachies or which SQL in a PL/SQL call had the most impact or even that no single SQL statement was a main contributor to the total elapsed time.
    And sometimes, for me especially for correlations between SQL_IDs and wait events – wait events that I didn’t even notice in the Wait events diagram – I find gold.

    The main problem I see with it is with the quality of the input data (long intervals, series which correlate don’t go into the same snapshot, things that don’t go into the snapshot at all, e.g. lots of similar statements that don’t use bind variables, etc.).

    I see it like this: If I do an analysis on an interval clicking on the correlation icon and waiting some additonal seconds (or maybe minutes) is worth the effort if I can deduce something useful “from time to time”.

    That is my experience and opinion. I am curious to hear other peoples experience with the correlation report.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s