I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure if this is the correct mailing list. So if it's not then please let me know where I need to post this.
These are monitoring-centric enhancement requests since I'm trying to implement accurate monitoring in a secure fashion.
* General monitoring:
We have a need for a "monitoring" role in PostgreSQL that has read-only access to any "pg_stat" view. As of 9.4, only a super-user can read all columns of "pg_stat_activity", "pg_stat_replication", and "pg_stat_archiver" (there may be other restricted
views as well). These views provide critical insight on how well the cluster is operating and what is going on.
There appears to be only two ways to gain access to these views:
Option 1 is too big of a security risk to grant to a monitoring user. If the monitoring system gets compromised then the DB will be at risk too.
Option 2 requires creating, deploying, and maintaining sets of monitoring functions, which is a decent chunk of work in a large environment with dozens or hundreds of deployments, many running different versions of postgres possibly needing custom versions
of the functions. When you add the bureaucracy of a large IT organization and SOX and PCI compliance requirements it ends up being a PITA implementing or changing these functions when you only have a small DBA team.
* Streaming Replication Monitoring:
Make the "pg_stat_replication" view more persistent (maybe keep the rows for 24 hours or have a registration process?).
There appears to be no way to *accurately* monitor streaming replication via SQL alone currently. This is due to three different problems:
Ideally the master should be able to report standby lag time via SQL, even when there has been a disruption in connectivity with a standby.
The only accurate method I have found to measure standby lag is to create a synthetic update that runs periodically. This works, but is less than ideal and requires adding a table to every cluster (which then has to be vacuumed frequently too) and writing
and maintaining a process to update it and purge it.
These two enhancements would go a long way in making it easier to monitor PostgreSQL clusters and replication because it would eliminate a lot of custom coding requirements and enable us to pull metrics directly via simple SQL.
If anyone can provide insight on how I could accomplish these in a simple manner by other means then I'm all ears!
Thanks,
Rob
|