On 9/21/07, Nikita The Spider The Spider <nikitathespider@xxxxxxxxx> wrote: > On 9/21/07, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > "Nikita The Spider The Spider" <nikitathespider@xxxxxxxxx> writes: > > > I'm seeing a problem where pg_dump takes at least 5 minutes to execute > > > no matter what I ask it to dump -- even a non-existent or empty table. > > > One possible red flag is that pg_type contains 56508 rows. This > > > doesn't seem excessive to me, but perhaps it should. > > > > That does seem like a lot --- what sort of types are they? Scalar, > > composite, what? It's fairly likely that no one has tried to optimize > > pg_dump for such a case. > > Aha, thanks. Didn't realize I was pushing the bounds of what was > reasonable. Here's the type counts: > typrelkind | the_count > ------------+----------- > | 114 > sequence | 11496 > composite | 12290 > ordinary | 13844 > TOAST | 9215 > view | 9699 > (6 rows) I think I found the problem. getTypes() calls findFuncByOid() twice for each item it sees. A comment at the top of findFuncByOid() says, "should hash this, but just do linear search for now". So that funtion is O(n)/2 where n = the # of functions found by the query in getFuncs(), and since it's called twice for each item in getTypes, the overall performance is O(m * n) where n = # of functions and m = # of types. As I said before, I have ~56000 items in pg_type and the query at the top of getFuncs returns ~98000 rows so for me performance is O(5.4 billion)-ish. Some experimentation backs up my assertion that this is the performance bottleneck. I created another database on the same machine with 10k each of tables, views and custom types. The tables contained a SERIAL PRIMARY KEY and a text column so that PG created indexes and TOAST tables to go along with them. The result was ~50000 items in pg_type. When I run pg_dump against this database, it completes in 4 seconds so simply having a lot of items in pg_type is not enough to recreate my problem. I added 100k functions to that database and pg_dump took several minutes to run (again, trying to dump a non-existent table), which is what I'm seeing on my production database. My database design aimed to make each of my 1600 or so schemas its own little self-sufficent world, hence the proliferation of types and functions. I didn't realize that at some level they're all effectively in the same namespace. Now I know. Sorry I can't offer a patch. It'd be fun to write C code again but my skills are pretty rusty and I'm afraid I'd butcher it. :-P Thanks for your help! Given that this problem seems to be triggered by a sort of edge case and the fix is non-trivial, I guess I should not expect a new version of pg_dump soon? Cheers -- Philip http://NikitaTheSpider.com/ Whole-site HTML validation, link checking and more ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend