Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Tim <timfosho@xxxxxxxxx> · Tue, 5 Oct 2021 20:02:30 -0400

Jeff Holt is probably pretty embarrassed there's some blowhard making a scene using his name in a casual mailing list thread.

On Tue, Oct 5, 2021 at 5:28 PM Mladen Gogala <gogala.mladen@xxxxxxxxx> wrote:
Comments in-line

On 10/5/21 16:24, Peter Geoghegan wrote:

> On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt <jeff.holt@xxxxxxxxxxxx> wrote:

>> Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.

>>

>> I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time.

> I have heard of method R. Offhand it seems roughly comparable to

> something like the Top-down Microarchitecture Analysis Method that low

> level systems programmers sometimes use, along with Intel's pmu-tools

> -- at least at a very high level. The point seems to be to provide a

> workflow that can plausibly zero in on low-level bottlenecks, by

> providing high level context. Many tricky real world problems are in

> some sense a high level problem that is disguised as a low level

> problem. And so all of the pieces need to be present on the board, so

> to speak.

>

> Does that sound accurate?

Yes, that is pretty accurate. It is essentially the same method 

described in the "High Performance Computing" books. The trick is to 

figure what the process is waiting for and then reduce the wait times. 

All computers wait at the same speed.

> One obvious issue with much of the Postgres instrumentation is that it

> makes it hard to see how things change over time. I think that that is

> often *way* more informative than static snapshots.

>

> I can see why you'd emphasize the need for PostgreSQL to more or less

> own the end to end experience for something like this. It doesn't

> necessarily follow that the underlying implementation cannot make use

> of infrastructure like eBPF, though. Fast user space probes provably

> have no overhead, and can be compiled-in by distros that can support

> it. There hasn't been a consistent effort to make that stuff

> available, but I doubt that that tells us much about what is possible.

> The probes that we have today are somewhat of a grab-bag, that aren't

> particularly useful -- so it's a chicken-and-egg thing.

Not exactly. There already is a very good extension for Postgres called 

pg_wait_sampling:

https://github.com/postgrespro/pg_wait_sampling

What is missing here is mostly the documentation. This extension should 

become a part of Postgres proper and the events should be documented as 

they are (mostly) documented for Oracle. Oracle uses trace files 

instead. However, with Postgres equivalence of files and tables, this is 

not a big difference.

>

> It would probably be helpful if you could describe what you feel is

> missing in more general terms -- while perhaps giving specific

> practical examples of specific scenarios that give us some sense of

> what the strengths of the model are. ISTM that it's not so much a lack

> of automation in PostgreSQL. It's more like a lack of a generalized

> model, which includes automation, but also some high level top-down

> theory.

I am not Jeff and my opinion is not as valuable and doesn't carry the 

same weight, by far. However, I do believe that we may not see Jeff Holt 

again on this group so I am providing my opinion instead. At least I 

would, in Jeff's place, be reluctant to return to this group.

-- 

Mladen Gogala

Database Consultant

Tel: (347) 321-1217

https://dbwhisperer.wordpress.com