Re: backups blocking everything

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 27 Oct 2011 12:39:51 -0500

On Thu, Oct 27, 2011 at 11:47 AM, Samuel Gendler
<sgendler@xxxxxxxxxxxxxxxx> wrote:
> I've got a large mixed-used database, with the data warehouse side of things
> consisting of several tables at hundreds of millions of rows, plus a number
> of tables with tens of millions.  There is partitioning, but as the volume
> of data has risen, individual partitions have gotten quite large.  Hardware
> is 2x4 core 2.0Ghz Xeon processors, 176GB of RAM, 4 drives in raid 10 for
> WAL logs and 16 or 20 spindles for data, also in RAID 10.  Total database
> size is currently 399GB - via pg_database_size().  It's also worth noting
> that we switched from 8.4 to 9.0.4 only about a month ago, and we were not
> seeing this problem on 8.4.x.  The database is growing, but not at some kind
> of exponential rate. full backup, compressed, on the old hardware was 6.3GB
> and took about 1:45:00 to be written.  Recent backups are 8.3GB and taking 3
> or 4 hours.  We were not seeing al queries stall out during the backups on
> 8.4, so far as I am aware.
> The time it takes for pg_dump to run has grown from 1 hour to 3 and even 4
> hours over the last 6 months, with more than half of that increase occurring
> since we upgrade to 9.0.x.  In the last several weeks (possibly since the
> upgrade to 9.0.4), we are seeing all connections getting used up (our main
> apps use connection pools, but monitoring and some utilities are making
> direct connections for each query, and some of them don't check for the
> prior query to complete before sending another, which slowly eats up
> available connections).  Even the connection pool apps cease functioning
> during the backup, however, as all of the connections wind up in parse
> waiting state.  I also see lots of sockets in close wait state for what
> seems to be an indefinite period while the backup is running and all
> connections are used up.  I assume all of this is the result of pg_dump
> starting a transaction or otherwise blocking other access.  I can get
> everything using a pool, that's not a huge problem to solve, but that won't
> fix the fundamental problem of no queries being able to finish while the
> backup is happening.
> I know I'm not the only one running a database of this size.  How do others
> handle backups?  At the moment, I don't have replication happening.  I can
> use the old hardware to replicate to.  It doesn't have quite the i/o
> capacity and nowhere near as much RAM, but I wouldn't be looking to use it
> for querying unless I lost the primary, and it is definitely capable of
> handling the insert load, at least when the inserts are being done directly.
>  I'm not sure if it is easier or harder for it to handle the same inserts
> via streaming replication.  My question is, what are the performance
> repercussions of running a pg_dump backup off the replicated server.  If it
> experiences the same kind of lockup, will SR get so far behind that it can't
> catch up?  Is there some other preferred way to get a backup of a large db?
> And finally, is the lockout I'm experiencing actually the result of a bug or
> misuse of pg_dump in some way?

I can't speak to the slower backups on 9.0.x issue, but if I were you
I'd be implementing hot standby and moving the backups to the standby
(just be aware that pg_dump will effectively pause replication and
cause WAL files to accumulate during the dump).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance