Re: XID wraparound with huge pg_largeobject

Jeff Janes <jeff.janes@xxxxxxxxx> · Wed, 2 Dec 2015 09:18:37 -0800

On Wed, Dec 2, 2015 at 8:25 AM, David Kensiski <David@xxxxxxxxxxxx> wrote:
>
>
> On Tue, Dec 1, 2015 at 9:12 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
>>
>> On Mon, Nov 30, 2015 at 9:58 AM, David Kensiski <David@xxxxxxxxxxxx>
>> wrote:
>> > I am working with a client who has a 9.1 database rapidly approaching
>> > XID
>> > wraparound.
>>
>> The hard limit at 2 billion, or the soft limit at
>> autovacuum_freeze_max_age?
>
>
> We're approaching the hard limit -- we are about to break 1.5 billion.
>
>
>> > They also have an exceedingly large pg_largeobject table (4217
>> > GB) that has never been vacuumed.  An attempt to vacuum this on a
>> > replica
>> > has run for days and never succeeded.
>>
>> What was slowing it down? Reading? Writing? CPU? fdatasync? Locks?
>> Was it run with throttling (e.g. nonzero vacuum_cost_delay) or
>> without?
>
>
> I just talked to my colleague who had tested it and it just stopped doing
> anything.  No cpu, no disk i/o, no apparent activity.  No bueno.

If you can get it going again, I'd either strace it or attach gdb to
it to get a backtrace, if you have those tools, to see what is going
on.

Is there a lot of free space in pg_largeobjects table (i.e. recently
ran vacuumlo)?  I wonder if it weren't doing a very slow backwards
scan over the table in order to truncate away unused space.  The
problem is that the backwards scan might not trigger the kernels
read-ahead code, so every page is read in as a random IO, rather than
sequential IO.  This can look a lot like doing nothing, depending on
what monitoring tools you use.  the disk is always busy, but just in a
horribly inefficient way.

Assuming you are on minor release 9.1.10 or later, if this is the case
you should be able to just have another session do a `lock TABLE
pg_largeobject in access share mode;` and hold the lock for a while.
This will cause the vacuum to abandon the truncation scan and finish
up the accounting for the freezing.  You can then worry about
finishing up the truncation once the wrap-around danger is over.

>>
>>
>> What is the throughput available on our RAID?
>
>
> It's 6 drives in a RAID 10 configuration, so striped across three Seagate
> Barracuda drives.  Theoretically we should be able to get as much as 18
> Gb/s, actual mileage may vary.

Does it have non-volatile write buffer to absorb fsyns requests
without having to wait for them to actually reach disk?  If not, you
could have a problem with the small ring buffer that vacuum uses.  In
order to read in a new page, it first needs to kick out an existing
one.  But for a freeze operation where every block needs freezing, the
existing page is almost certainly dirty, so it needs to write it out.
To write it, it needs to write and fsync the WAL record which covers
the dirtying of that page.  Since vacuum uses a small ring buffer,
this happens very often and can really slow things down.

If it used a larger ring of buffers, this would be less of a problem
because the buffers have a longer time to cool off before being
reused, so their WAL is probably already on disk by that time.
Unfortunately there is no way to increase the ring buffer size without
compiling your own postgres.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general