On 7/30/09 11:14 AM, "Stefan Kaltenbrunner" <stefan@xxxxxxxxxxxxxxxx> wrote: > Tom Lane wrote: >> "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes: >>> Since the dump to custom format ran longer than the full pg_dump >>> piped directly to psql would have taken, the overall time to use this >>> technique is clearly longer for our databases on our hardware. >> >> Hmmm ... AFAIR there isn't a good reason for dump to custom format to >> take longer than plain text dump, except for applying compression. >> Maybe -Z0 would be worth testing? Or is the problem that you have to >> write the data to a disk file rather than just piping it? > > I always dump with -Z0(and compress afterwards or even in a pipe to get > two cores busy) because otherwise custom dump times are simply ridiculous. > However Kevin is on something here - on the typical 4-8 core box I > tested I managed to an around cores/2 speedup for the restore which > means that for a pure upgrade or testing similiar to what kevin is doing > custom dumps + parallel restore might result in no win or even a loss. > > On on of our datasets I did some benchmarking a while ago (for those who > attended bruce pg_migrator talk @pgcon these are same numbers): > > > * 150GB Database (on-disk - ~100GB as a plain text dump) > > time to dump(-C0): 120min > time to restore(single threaded): 180min > time to restore(-j 16): 59min Note also that with ext3 and XFS (untuned) parallel restore = HORRIBLY FRAGMENTED tables, to the point of sequential scans being rather slow. At least, they're mostly just interleaved with each other so there is little seeking backwards, but still... Beware. XFS with allocsize=64m or so interleaves them in reasonably large chunks though and prevents significant fragmentation. > > however the problem is that this does not actually mean that parallel > restore shaves you ~120min in dump/restore time because you get the > following real runtimes: > > plain text dump + single threaded restore in a pipe: 188min > custom dump to file + parallel restore: 179min On the other hand, I find that the use case where one DB is dumped to a backup, and then this backup is restored on several others -- that parallel restore is extremely useful there. Dump needs to be parallelized or at least pipelined to use more cores. COPY on one thread, compression on another? One trick with a dump, that works only if you have tables or schemas that can safely dump in different transactions, is to dump concurrently on different slices of the DB manually. This makes a huge difference if that is possible. > > > this is without compression, with the default custom dump + parallel > restore is way slower than the simple approach on reasonable hardware. > > > Stefan > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance