On Fri, Oct 8, 2021 at 11:40 PM Bruce Momjian <bruce@xxxxxxxxxx> wrote: > > On Fri, Oct 8, 2021 at 05:28:37PM +0200, Thomas Kellerer wrote: > > > > We typically use the AWR reports as a post-mortem analysis tool if > > something goes wrong in our application (=customer specific projects) > > > > E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning", > > then we usually request an AWR report from the time span in question. Quite frequently > > this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history. > > > > So in our case it's not really used for active monitoring, but for > > finding the root cause after the fact. > > > > I don't know how representative this usage is though. > > OK, that's a good usecase, and something that certainly would apply to > Postgres. Don't you often need more than just wait events to find the > cause, like system memory usage, total I/O, etc? You usually need a variety of metrics to be able to find what is actually causing $random_incident, so the more you can aggregate in your performance tool the better. Wait events are an important piece of that puzzle. As a quick example for wait events, I recently had to diagnose some performance issue, which turned out to be some process reaching the 64 subtransactions with the well known consequences. I had pg_wait_sampling aggregated metrics available so it was really easy to know that the slowdown was due to that. Knowing what application exactly reached those 64 subtransactions is another story.