Re: problems with large objects dump

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 12 Oct 2012 21:31:54 -0400

I wrote:
> Sergio Gabriel Rodriguez <sgrodriguez@xxxxxxxxx> writes:
>> I never use oprofile, but for a few hours into the process, I could take
>> this report:
>> 1202449  56.5535  sortDumpableObjects

> Hm.  I suspect a lot of that has to do with the large objects; and it's
> really overkill to treat them as full-fledged objects since they never
> have unique dependencies.  This wasn't a problem when commit
> c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
> might be because of the additional constraints added in commit
> a1ef01fe163b304760088e3e30eb22036910a495.  I wonder if it's time to try
> to optimize pg_dump's handling of blobs a bit better.  But still, any
> such fix probably wouldn't make a huge difference for you.  Most of the
> time is going into pushing the blob data around, I think.

For fun, I tried adding 5 million empty blobs to the standard regression
database, and then did a pg_dump.  It took a bit under 9 minutes on my
workstation.  oprofile showed about 32% of pg_dump's runtime going into
sortDumpableObjects, which might make you think that's worth optimizing
... until you look at the bigger picture system-wide:

  samples|      %|
------------------
   727394 59.4098 kernel
   264874 21.6336 postgres
   136734 11.1677 /lib64/libc-2.14.90.so
    39878  3.2570 pg_dump
    37025  3.0240 libpq.so.5.6
    17964  1.4672 /usr/bin/wc
      354  0.0289 /usr/bin/oprofiled

So actually sortDumpableObjects took only about 1% of the CPU cycles.
And remember this is with empty objects.  If we'd been shoving 200GB of
data through the dump, the data pipeline would surely have swamped all
else.

So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance