Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Julien Rouhaud <rjuju123@xxxxxxxxx> · Fri, 8 Oct 2021 23:59:03 +0800

On Fri, Oct 8, 2021 at 11:40 PM Bruce Momjian <bruce@xxxxxxxxxx> wrote:
>
> On Fri, Oct  8, 2021 at 05:28:37PM +0200, Thomas Kellerer wrote:
> >
> > We typically use the AWR reports as a post-mortem analysis tool if
> > something goes wrong in our application (=customer specific projects)
> >
> > E.g. if there was a slowdown "last monday" or "saving something took minutes yesterday morning",
> > then we usually request an AWR report from the time span in question. Quite frequently
> > this already reveals the culprit. If not, we ask them to poke in more detail into v$session_history.
> >
> > So in our case it's not really used for active monitoring, but for
> > finding the root cause after the fact.
> >
> > I don't know how representative this usage is though.
>
> OK, that's a good usecase, and something that certainly would apply to
> Postgres.  Don't you often need more than just wait events to find the
> cause, like system memory usage, total I/O, etc?

You usually need a variety of metrics to be able to find what is
actually causing $random_incident, so the more you can aggregate in
your performance tool the better.  Wait events are an important piece
of that puzzle.

As a quick example for wait events, I recently had to diagnose some
performance issue, which turned out to be some process reaching the 64
subtransactions with the well known consequences.  I had
pg_wait_sampling aggregated metrics available so it was really easy to
know that the slowdown was due to that.  Knowing what application
exactly reached those 64 subtransactions is another story.