On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce@xxxxxxxxxx> wrote: > On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote: >> pg_dump itself seems to have a lot of quadratic portions (plus another >> one on the server which it hits pretty heavily), and it hard to know >> where to start addressing them. It seems like addressing the overall >> quadratic nature might be a globally better option, but addressing >> just the problem with dumping one schema might be easier to kluge >> together. > > Postgres 9.2 will have some speedups for pg_dump scanning large > databases --- that might help. Those speed ups don't seem to apply here, though. I get the same performance in 9.0.7 as 9.2.beta1. There is an operation in pg_dump which is O(#_of_schemata_in_db * #_of_table_in_db), or something like that. The attached very crude patch reduces that to O(log_of_#_of_schemata_in_db * #_of_table_in_db) I was hoping this would be a general improvement. It doesn't seem be. But it is a very substantial improvement in the specific case of dumping one small schema out of a very large database. It seems like dumping one schema would be better optimized by not loading up the entire database catalog, but rather by restricting to just that schema at the catalog stage. But I haven't dug into those details. For dumping entire databases, It looks like the biggest problem is going to be LockReassignCurrentOwner in the server. And that doesn't seem to be easy to fix, as any change to it to improve pg_dump will risk degrading normal use cases. If we want to be able to efficiently dump entire databases in a scalable way, it seems like there should be some way to obtain a data-base-wide AccessShare lock, which blocks AccessExclusive locks on any object in the database, and turns ordinary object-level AccessShare lock requests into no-ops. I don't think you can get hundreds of thousands of simultaneously held and individually recorded AccessShare locks without causing bad things to happen. Cheers, Jeff
Attachment:
pg_dump_ns_search.patch
Description: Binary data
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance