Re: (Ideas) pg_dump in a production environment

Russell Smith <mr-russ@xxxxxxxxxx> · Tue, 24 May 2005 19:53:31 +1000

On Tue, 24 May 2005 02:12 pm, Tom Lane wrote:
> "Thomas F. O'Connell" <tfo@xxxxxxxxxxxx> writes:
> > I'd like to use pg_dump to grab a live backup and, based on the  
> > documentation, this would seem to be a realistic possibility. When I  
> > try, though, during business hours, when people are frequently  
> > logging in and otherwise using the application, the application  
> > becomes almost unusable (to the point where logins take on the order  
> > of minutes).
> 
> The pg_dump sources contain some comments about throttling the rate
> at which data is pulled from the server, with a statement that this
> idea was discussed during July 2000 and eventually dropped.  Perhaps
> you can think of a better implementation.

A brief look at the code suggests a couple of possibilities for fixing problems.
There seem to be a least two different issues here from the user point of view.

Issue 1: Large Tables cause server slowdown
---
There are two dump cases in my short reading of the pg_dump code.  Case 1 is
the copy dump, which is done as one command.  The server does most of the work.
Case 2 is the INSERT type dump, where the pg_dump client does most of the work
creating the INSERT statement.  Case 2 is done with a cursor, and it would be easy to
insert a fixed delay sleep at the end of a certain amount of record dumps.  I'm sure we
could work out the average size of a tuple in this case (2), and even pause after a certain
amount of data has been transferred.

I am unsure about how to attack Case 1, as mentioned it is handled mostly in the backend code
which we don't really control.  If it could be declared as a CURSOR you could you the same
principal as Case 2.  The current throttling suggestions are all based on time.  I think that
a data/counter based solution would be less intense on the system.  When counter is reached,
just do a usleep for the throttle time.

Issue 2: Full Backups of large amount of data saturate disk I/O (Many tables make it slow)
---
If the backup dump is large, and given all files will be sequentially scanned during the backed,
the server IO is going to be pushed to the limit.  A pause between dumping tables seems a
simple possibility to reduce the ongoing IO load on the server to allow for a period where other
requests can be served.  This would result in a bursty type performance improvement.  In environments
with large numbers of tables of a reasonable size, this could give a benefit.

---
In releases prior to 8.0, any sort of wait on a certain amount of data would possibly not evict high use
data as the wait time would mean that the frequently used data would have been accessed again,
meaning you would evict the seqscan data you requested for the previous part of the dump.
In post 8.0, or 8.1 with clock sweep, it's possibly the same situation with regard to the delays, but
you could possibly process larger amounts of data before the sleep, as you would keep recycling the same
buffers.  You would use the sleep to reduce disk IO more than the reduce cache eviction.

The problem with timing waits for any backups are the database is not able to be vacuumed.  In some
limited circumstances (like mine), If you have a long running transaction that blocks vacuum to certain
small high update tables, you lose performance as the table bloats and can only fix it with a vacuum full.

Both of these suggestions may be totally bogus.  So I suppose I'm asking for feedback on them to see if
they would be worthwhile implementing.

Regards

Russell Smith

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings