Re: Any risk or overhead considerations for frequently executing queries against catalog tables?

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Sat, 25 Jan 2025 19:18:49 +0100

Hi

so 25. 1. 2025 v 18:00 odesílatel Frits Hoogland <frits.hoogland@xxxxxxxxx> napsal:
Thank you Pavel, that is really useful. I can imagine other people thinking about getting fine grained data from postgres might wonder the same as I do about this.And really from a computer's perspective I would say that once a second isn't really a high frequency?

I usually work with minute sampling and usually it is good enough (statistics are cumulative, so you can lose the timestamp, but you never lose data.

Only when we try to investigate some special case, then I use second sampling. When you investigate lock issues, then seconds are too much

Regards

Pavel

If I time the amount of time that these queries take, it's around 20ms (local connection), so there is a relative long time of all the objects including pg_database are not actively queried.

I git grepped the sourcecode, it seems that there is a rowexclusive lock for pg_database manipulation in case of addition, removal and change of a database in dbcommands.c, but I do think your reasoning is based on the columns datfrozenxid and datminmxid?

There is a lock for updating the frozenxid and mxid for a database in (vacuum.c:LockDatabaseFrozenIds, ExclusiveLock), but it seems a select should play nice with that?

btw, it's interesting to see that both datfrozenxid and datminmxid are in place updated, with no read consistency provided.

Frits Hoogland

On 25 Jan 2025, at 14:32, Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote:

Hi

so 25. 1. 2025 v 12:23 odesílatel Frits Hoogland <frits.hoogland@xxxxxxxxx> napsal:
For monitoring database behaviour and trying to build an history of activity, if I would create an application that creates a single connection and execute something like:select * from pg_stat_activity;
select * from pg_stat_database;
select * from pg_stat_bgwriter;
select * from pg_stat_wal;
select * from pg_settings;
select * from pg_database;
For which the query is prepared, and execute that every 1 second, would there be any realistic danger or overhead that should be considered?
My thinking is that the data for these catalogs are all in shared memory and when executed serially and do not cause any significant resources to be taken?

The queries to all tables excluding pg_database every 1 sec will have probably zero impact to performance.

I am not sure about pg_database - it is a very important table, and your query can block operations that need exclusive lock to this table. So theoretically, there can be some impact to performance.

Regards

Pavel

Thanks,

Frits Hoogland