On Thu, Oct 27, 2011 at 11:47 AM, Samuel Gendler <sgendler@xxxxxxxxxxxxxxxx> wrote: > I've got a large mixed-used database, with the data warehouse side of things > consisting of several tables at hundreds of millions of rows, plus a number > of tables with tens of millions. There is partitioning, but as the volume > of data has risen, individual partitions have gotten quite large. Hardware > is 2x4 core 2.0Ghz Xeon processors, 176GB of RAM, 4 drives in raid 10 for > WAL logs and 16 or 20 spindles for data, also in RAID 10. Total database > size is currently 399GB - via pg_database_size(). It's also worth noting > that we switched from 8.4 to 9.0.4 only about a month ago, and we were not > seeing this problem on 8.4.x. The database is growing, but not at some kind > of exponential rate. full backup, compressed, on the old hardware was 6.3GB > and took about 1:45:00 to be written. Recent backups are 8.3GB and taking 3 > or 4 hours. We were not seeing al queries stall out during the backups on > 8.4, so far as I am aware. > The time it takes for pg_dump to run has grown from 1 hour to 3 and even 4 > hours over the last 6 months, with more than half of that increase occurring > since we upgrade to 9.0.x. In the last several weeks (possibly since the > upgrade to 9.0.4), we are seeing all connections getting used up (our main > apps use connection pools, but monitoring and some utilities are making > direct connections for each query, and some of them don't check for the > prior query to complete before sending another, which slowly eats up > available connections). Even the connection pool apps cease functioning > during the backup, however, as all of the connections wind up in parse > waiting state. I also see lots of sockets in close wait state for what > seems to be an indefinite period while the backup is running and all > connections are used up. I assume all of this is the result of pg_dump > starting a transaction or otherwise blocking other access. I can get > everything using a pool, that's not a huge problem to solve, but that won't > fix the fundamental problem of no queries being able to finish while the > backup is happening. > I know I'm not the only one running a database of this size. How do others > handle backups? At the moment, I don't have replication happening. I can > use the old hardware to replicate to. It doesn't have quite the i/o > capacity and nowhere near as much RAM, but I wouldn't be looking to use it > for querying unless I lost the primary, and it is definitely capable of > handling the insert load, at least when the inserts are being done directly. > I'm not sure if it is easier or harder for it to handle the same inserts > via streaming replication. My question is, what are the performance > repercussions of running a pg_dump backup off the replicated server. If it > experiences the same kind of lockup, will SR get so far behind that it can't > catch up? Is there some other preferred way to get a backup of a large db? > And finally, is the lockout I'm experiencing actually the result of a bug or > misuse of pg_dump in some way? I can't speak to the slower backups on 9.0.x issue, but if I were you I'd be implementing hot standby and moving the backups to the standby (just be aware that pg_dump will effectively pause replication and cause WAL files to accumulate during the dump). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance