On Tue, Dec 1, 2015 at 9:12 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Mon, Nov 30, 2015 at 9:58 AM, David Kensiski <David@xxxxxxxxxxxx> wrote:
> I am working with a client who has a 9.1 database rapidly approaching XID
> wraparound.
The hard limit at 2 billion, or the soft limit at autovacuum_freeze_max_age?
We're approaching the hard limit -- we are about to break 1.5 billion.
> They also have an exceedingly large pg_largeobject table (4217
> GB) that has never been vacuumed. An attempt to vacuum this on a replica
> has run for days and never succeeded.
What was slowing it down? Reading? Writing? CPU? fdatasync? Locks?
Was it run with throttling (e.g. nonzero vacuum_cost_delay) or
without?
I just talked to my colleague who had tested it and it just stopped doing anything. No cpu, no disk i/o, no apparent activity. No bueno.
What is the throughput available on our RAID?
It's 6 drives in a RAID 10 configuration, so striped across three Seagate Barracuda drives. Theoretically we should be able to get as much as 18 Gb/s, actual mileage may vary.
> Are there creative ways to do such a vacuum with minimal impact on
> production? Even if I let the vacuum complete on the replica, I don't think
> I can play accrued logs from the master, can I?
No. And if you could replay the logs, I doubt it would have much of a
different impact than just running the vacuum freeze on the master
directly would. You just need to bite the bullet.
At some point you need to read the entire table in one session, even
if that means scheduling some downtime (or degraded performance time)
in order to do it. It will also need to rewrite the entire table, but
if there are "vacuum freeze" attempted but which don't run to
completion, their partial work will lessen the amount of writing (but
not reading) the ultimately successful vacuum will need to do. So
start vacuum freeze now, and if you end up needing to cancel it at
least part of its work will not go wasted.
Unpleasant, but if that's what we have to do, we have to do it. :-(
--Dave