Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

Franck Pachot <pg.franck@xxxxxxxxxx> · Mon, 11 Oct 2021 00:09:32 +0200

On Sun, Oct 10, 2021 at 11:06 PM Jeff Holt <jeff.holt@xxxxxxxxxxxx> wrote:
TLDR; If I spend the time necessary to instrument the many functions that are the equivalent of the Oracle counterparts, would anyone pull those changes and use them? Specifically, for those who know Oracle, I'm talking about implementing:
The portion of the ALTER SESSION that enables extended SQL trace
Most of the DBMS_MONITOR and DBMS_APPLICATION_INFO packages
Instrument the thousand or so functions that are the equivalent of those found in Oracle's V$EVENT_NAME
Dynamic performance view V$DIAG_INFO
For the last 35 years, I've made my living helping people solve Oracle performance problems by looking at it, which means:
Trace a user experience and profile the trace file to (a) reveal where the time has gone and its algorithm and (b) make it easy to imagine the cost of possible solutions as well as the savings in response time or resources.
I've even submitted change requests to improve Oracle's tracing features while working for them and since those glorious five years.
Now looking closely at postgreSQL, I see an opportunity to more quickly implement Oracle's current feature list.
I've come to this point because I see many roadblocks for users who want to see a detailed "receipt" for their response time. The biggest roadblock is that without a lot of automation, a user of any kind must log into the server and attempt to get the data that are now traditionally child's play for Oracle. The second biggest roadblock I see is the recompilation that is required for the server components (i.e., postgreSQL, operating system). My initial attempts to get anything useful out of postgreSQL were dismal failures and I think it should be infinitely easier.
Running either dtrace and eBPF scripts on the server should not be required. The instrumentation and the code being instrumented should be tightly coupled. Doing so will allow anyone on any platform for any PostgreSQL version to get a trace file just as easily as people do for Oracle.

I hope this kind of instrumentation will make its way to PostgreSQL one day. Knowing where the time is spent changes the performance troubleshooting approach from guess-and-try to a scientific method. This is what made Linux a valid OS for enterprises, when instrumentation reached the same level as we got on Unix. There's a demand for it in enterprises: for example, EDB Advanced Server implemented timed wait events. I'm sure having it in open source postgres will help to understand the performance issues encountered by users, then helping to improve the database. Profiling where the database time is spent should not be reserved to commercial databases. Having the source code visible is not sufficient to understand what happens in production. Observability should also be there.

There is a fear in the postgres community that features are implemented just because they exist in oracle, and mentioning oracle is often seen suspicious. Probably because of the risk of adding complexity for no user value. Here, about instrumentation, I think that looking at what Oracle did during 20 years is a good start. Because instrumentation is not an easy task. Some waits are too short to have meaningful timing (the timing itself may take more cpu cycles than the instrumentation itself). Some tasks are critical to be measured. Looking at what Oracle Support implemented in order to solve big customer problems can give a good basis. Of course, all this must be adapted for postgres. For example, a write system call may be a logical or physical write because there's no direct I/O. At least, a precise timing, aggregated to histograms, will help to distinguish which writes were filesystem hits, or storage cache hits, or went to disk. And on the most common platform, the overhead is minimal because getting the timestamp can be done in userspace.

Today, Linux has many tools that were not there when Oracle had to implement wait events. And people may think the Linux tools are sufficient today. However, getting system call time is not easy in production (strace must attach to the process) and other tools (perf) are only sampling: gives an idea but hides the details. Unfortunately, what we have from the OS gives interesting clues (for guess and try) but not enough facts (for scientific approach).

So the proposal is great, but there is also the risk of putting a large effort in describing the specification and maybe a patch, and that it is rejected. It should probably be discussed in the 
-hackers list (https://www.postgresql.org/list/pgsql-hackers/) first. And people will dislike it because it mentions Oracle. Or people will dislike it because they think this should be reserved to commercial forks. Or because it may introduce too much dependency on the OS. But some others will see the value of it. Discussions are good as long as they stay focused on the value of the community project. I don't have skills to contribute to the code, but will be happy to expose the need for this instrumentation (profiling time spent in database functions or system calls) as I have many examples for it.