DataDog — which implements such metrics for Postgres - has ran into multiple issues doing this type of thing. You may be able to search their bugs / repo to see what they were. I just can’t remember them off hand, it’s been a while.Sent from my iPhone On Jan 25, 2025, at 12:01 PM, Frits Hoogland <frits.hoogland@xxxxxxxxx> wrote:
I am looking at whether sampling key database catalog information per second would have any drawback whatsoever. I think you're saying that you think isn't the case, except maybe for pg_database, and I figure that is because of the frozen and multi xact fields per database.
If the database client application is too unpredictable to know what SQL it will produce, then having runtime data available at that granularity, so it can be reasonably constructed what is going on is very convenient and allows tremendous insight. It would also allow usage of the waitevents to spot any weird behavior, such as short-lived peaks. (pg_stat_statements can do that on a busy database, for example). And if there is no known drawback, if such a low interval can be organized: why not? I am not saying you are doing it wrong, this is about trying to figure out what are the borders of what would be technically possible without unreasonably affecting the database, a thought experiment.
If course the gathered data needs to be organized so that you don't swamp in it, and it shouldn't lead to the monitoring data swamping the system, either in memory or on disk, but that is a given.
Why would per second be too much for locks? Is there overhead to select from pg_locks, or pg_blocking_pids()?
Again, please realise I am happy and appreciative of the time you take, I am toying with the above described idea.
On 25 Jan 2025, at 19:18, Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote:
Hi
Thank you Pavel, that is really useful. I can imagine other people thinking about getting fine grained data from postgres might wonder the same as I do about this. And really from a computer's perspective I would say that once a second isn't really a high frequency?
I usually work with minute sampling and usually it is good enough (statistics are cumulative, so you can lose the timestamp, but you never lose data.
Only when we try to investigate some special case, then I use second sampling. When you investigate lock issues, then seconds are too much
Regards
Pavel
If I time the amount of time that these queries take, it's around 20ms (local connection), so there is a relative long time of all the objects including pg_database are not actively queried.
I git grepped the sourcecode, it seems that there is a rowexclusive lock for pg_database manipulation in case of addition, removal and change of a database in dbcommands.c, but I do think your reasoning is based on the columns datfrozenxid and datminmxid?
There is a lock for updating the frozenxid and mxid for a database in (vacuum.c:LockDatabaseFrozenIds, ExclusiveLock), but it seems a select should play nice with that?
btw, it's interesting to see that both datfrozenxid and datminmxid are in place updated, with no read consistency provided.
Hi
For monitoring database behaviour and trying to build an history of activity, if I would create an application that creates a single connection and execute something like: select * from pg_stat_activity; select * from pg_stat_database; select * from pg_stat_bgwriter; select * from pg_stat_wal; select * from pg_settings; select * from pg_database; For which the query is prepared, and execute that every 1 second, would there be any realistic danger or overhead that should be considered? My thinking is that the data for these catalogs are all in shared memory and when executed serially and do not cause any significant resources to be taken?
The queries to all tables excluding pg_database every 1 sec will have probably zero impact to performance.
I am not sure about pg_database - it is a very important table, and your query can block operations that need exclusive lock to this table. So theoretically, there can be some impact to performance.
Regards
Pavel
|