Re: pg_dump and thousands of schemas

Tatsuo Ishii <ishii@xxxxxxxxxxxxxx> · Thu, 31 May 2012 14:29:01 +0900 (JST)

> I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
> the initial phase of acquiring the locks, but it does nothing for the
> lock-related slowdown occurring in all pg_dump's subsequent commands.
> I think we really need to get in the server-side fix that Jeff Janes is
> working on, and then re-measure to see if something like this is still
> worth the trouble.

Well, even with current backend, locking 100,000 tables has been done
in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I
guess it will just save 3 seconds in my case. and if number of tables
is smaller, the saving will smaller. This suggests that most of time
for processing LOCK has been spent in communication between pg_dump
and backend. Of course this is just my guess, though.

> I am also a tad concerned about whether we might not
> have problems with parsing memory usage, or some such, with thousands of
> tables being listed in a single command.

That's easy to fix. Just divide each LOCK statements into multiple
LOCK statements.

My big concern is, even if the locking part is fixed (either by Jeff
Jane's fix or by me) still much time in pg_dump is spent for SELECTs
against system catalogs. The fix will be turn many SELECTs into single
SELECT, probably using big IN clause for tables oids.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance