Re: pg_dump and thousands of schemas

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 25 May 2012 12:56:17 -0400

Jeff Janes <jeff.janes@xxxxxxxxx> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)

> I was hoping this would be a general improvement.  It doesn't seem be.
>  But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.

Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA@xxxxxxxxxxxxxx>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing.  I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new.  I will go see about doing that.

> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.

The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database.  Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.

> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server.  And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance