Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Jeremy Schneider <schneider@xxxxxxxxxxxxxx> · Thu, 7 Oct 2021 19:15:39 -0700

On 10/5/21 13:24, Peter Geoghegan wrote:
> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@xxxxxxxxxxxx> wrote:
>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
>>
>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time.
> 
> It would probably be helpful if you could describe what you feel is
> missing in more general terms -- while perhaps giving specific
> practical examples of specific scenarios that give us some sense of
> what the strengths of the model are. ISTM that it's not so much a lack
> of automation in PostgreSQL. It's more like a lack of a generalized
> model, which includes automation, but also some high level top-down
> theory.

Back in my oracle days, I formally used method-R on a few consulting
gigs while working with Hotsos (RIP Gary). Method-R is brilliant, and I
referenced it in my PostgreSQL user group talk about wait events in PG.

https://www.slideshare.net/ardentperf/wait-whats-going-on-inside-my-database-173880246

I'm not the author of Method-R, but I myself would describe it as a
methodical approach to consistently solve business problems rooted in
database performance faster than any other methodical approach, built on
a foundation of wait events, queuing theory and tracing (aka logging).
But the most brilliant part is how Cary Millsap's tireless efforts to
simplify, automate and educate have made it accessible to ordinary data
analysts and project managers all over the world who speak SQL but not C.

PostgreSQL added wait events starting in 9.6 and the last thing that's
missing is an integrated way to trace or log them. A simple starting
point could be a session-level GUC that enables a hook in
pgstat_report_wait_start() and pgstat_report_wait_end() to just drop
messages in the log. These log messages could then easily be processed
to generate the similar profiles to the ones we used with other
databases. Basically I agree 100% with Jeff that while you can do these
things with perf probes or eBPF, there are massive advantages to having
it baked in the database. With the right tools, this makes session
profiling available to regular users (who do their day jobs with excel
rather than eBPF).

However, one problem to watch out for will be whether the existing
PostgreSQL logging infrastructure can handle this. Probably need higher
precision timestamps (I need to check what csvlog has), and it could
still be a lot of volume with some lightweight locks. Whereas Oracle had
each individual process write the wait event trace messages to its own
file, today PostgreSQL only supports either the single-system-wide-file
logging collector, or syslog which I think can only split to 8
destinations (and may be lossy).

There's another use case where high logging bandwidth could also be
useful - temporarily logging all SQL statements to capture workload.
Next time I see someone take down their production database because the
pgBadger doc said "log_min_duration_statement = 0" ... WHY PGBADGER WHY?

Anyway I do hope there will be some improvements in this area with
PostgreSQL. I'm not much of a C coder but maybe I'll take a swing at it
some day!

Anyway, Jeff, nice to see you here - and this is a topic I've thought
about a lot too. PostgreSQL is a pretty cool bit of software, and an
even cooler group of people around it. Hope to see you around some more.  :)

-Jeremy

PS. "tracing versus sampling" was the perpetual debate amongst
performance engineers... we could have some good fun debating along
those lines too. hold my beer

-- 
http://about.me/jeremy_schneider