Yes, we are now in the process of adding custom metrics/alerts around the xmin horizon across all of our postgres databases.
We will do a DB-wide VACUUM FULL as well (ironically, this incident started because VACUUM FULL failed last weekend).On Thu, Nov 2, 2017 at 11:06 AM, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
Tom, Arjun,
* Tom Lane (tgl@xxxxxxxxxxxxx) wrote:
> Arjun Ranade <ranade@xxxxxxxxxxxxxxxxx> writes:
> > After dropping the replication slot, VACUUM FULL runs fine now and no
> > longer reports the "oldest xmin is far in the past"
>
> Excellent. Maybe we should think about providing better tools to notice
> "stuck" replication slots.
+1
> In the meantime, you probably realize this already, but if global xmin
> has been stuck for months then you're going to have terrible bloat
> everywhere. Database-wide VACUUM FULL seems called for.
This, really, is also a lesson in "monitor your distance to transaction
wrap-around".. You really should know something is up a lot sooner than
the warnings from PG showing up in the logs.
Thanks!
Stephen