Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Mladen Gogala <gogala.mladen@xxxxxxxxx> · Tue, 5 Oct 2021 17:27:59 -0400

Comments in-line

On 10/5/21 16:24, Peter Geoghegan wrote:
On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@xxxxxxxxxxxx> wrote:
Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.

I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time.
I have heard of method R. Offhand it seems roughly comparable to
something like the Top-down Microarchitecture Analysis Method that low
level systems programmers sometimes use, along with Intel's pmu-tools
-- at least at a very high level. The point seems to be to provide a
workflow that can plausibly zero in on low-level bottlenecks, by
providing high level context. Many tricky real world problems are in
some sense a high level problem that is disguised as a low level
problem. And so all of the pieces need to be present on the board, so
to speak.

Does that sound accurate?
Yes, that is pretty accurate. It is essentially the same method 
described in the "High Performance Computing" books. The trick is to 
figure what the process is waiting for and then reduce the wait times. 
All computers wait at the same speed.
One obvious issue with much of the Postgres instrumentation is that it
makes it hard to see how things change over time. I think that that is
often *way* more informative than static snapshots.

I can see why you'd emphasize the need for PostgreSQL to more or less
own the end to end experience for something like this. It doesn't
necessarily follow that the underlying implementation cannot make use
of infrastructure like eBPF, though. Fast user space probes provably
have no overhead, and can be compiled-in by distros that can support
it. There hasn't been a consistent effort to make that stuff
available, but I doubt that that tells us much about what is possible.
The probes that we have today are somewhat of a grab-bag, that aren't
particularly useful -- so it's a chicken-and-egg thing.

Not exactly. There already is a very good extension for Postgres called 
pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling

What is missing here is mostly the documentation. This extension should 
become a part of Postgres proper and the events should be documented as 
they are (mostly) documented for Oracle. Oracle uses trace files 
instead. However, with Postgres equivalence of files and tables, this is 
not a big difference.

It would probably be helpful if you could describe what you feel is
missing in more general terms -- while perhaps giving specific
practical examples of specific scenarios that give us some sense of
what the strengths of the model are. ISTM that it's not so much a lack
of automation in PostgreSQL. It's more like a lack of a generalized
model, which includes automation, but also some high level top-down
theory.

I am not Jeff and my opinion is not as valuable and doesn't carry the 
same weight, by far. However, I do believe that we may not see Jeff Holt 
again on this group so I am providing my opinion instead. At least I 
would, in Jeff's place, be reluctant to return to this group.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com