Jeff Janes <jeff.janes@xxxxxxxxx> writes: > There is an operation in pg_dump which is O(#_of_schemata_in_db * > #_of_table_in_db), or something like that. > The attached very crude patch reduces that to > O(log_of_#_of_schemata_in_db * #_of_table_in_db) > I was hoping this would be a general improvement. It doesn't seem be. > But it is a very substantial improvement in the specific case of > dumping one small schema out of a very large database. Your test case in <CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA@xxxxxxxxxxxxxx> shows pretty conclusively that findNamespace is a time sink for large numbers of schemas, so that seems worth fixing. I don't like this patch though: we already have infrastructure for this in pg_dump, namely buildIndexArray/findObjectByOid, so what we should do is use that not invent something new. I will go see about doing that. > It seems like dumping one schema would be better optimized by not > loading up the entire database catalog, but rather by restricting to > just that schema at the catalog stage. The reason pg_dump is not built that way is that considerations like dump order dependencies are not going to work at all if it only looks at a subset of the database. Of course, dependency chains involving objects not dumped might be problematic anyway, but I'd still want it to do the best it could. > For dumping entire databases, It looks like the biggest problem is > going to be LockReassignCurrentOwner in the server. And that doesn't > seem to be easy to fix, as any change to it to improve pg_dump will > risk degrading normal use cases. I didn't try profiling the server side, but pg_dump doesn't use subtransactions so it's not clear to me why LockReassignCurrentOwner would get called at all ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance