Hello Admin List!
I've had some adventures this morning with our db following an overheat of the machine the night before (gratis of our hosting facility's cooling problem). The system came back up following a box restart, and postgres seemed to recover well. However, attempting to create a table, temp table, index or view will hang (I've confirmed that no conflicting locks exist in pg_locks) and you'll be able to see the process's memory usage climb to around 4000M of memory and then fail with an "out of memory" error. A truss on the process (we are on Solaris, if you are used to linux, think of it as strace) wasn't terribly helpful (at least by the time we can get to it) - a long string of "brk(0xDF259DC8) = 0" while it climbs to 4000M and then errors out. Non-creates seem fine, we've tried large/small selects/inserts/updates, drop tables, alter tables. All seem as normal. In looking around the catalogs to try and see any more clues, I took a look at pg_namespace. There were temp table namespaces listed there, however there should be no temp tables on the system right now at all. Here are some findings: db=# select count(*) from pg_namespace where nspname ilike 'pg_temp%'; count ------- 242 (1 row) db=# select c.relname, c.relnamespace, n.nspname from pg_class c db-# join pg_namespace n on c.relnamespace=n.oid db-# where n.nspname ~'pg_temp' order by n.nspname; relname | relnamespace | nspname -----------------------+--------------+------------ trigger_signup_mem_id | 118538615 | pg_temp_49 trigger_signup | 118538615 | pg_temp_49 (2 rows) db=# select * from pg_temp_49.trigger_signup; ERROR: invalid page header in block 40 of relation "trigger_signup" db=# Obviously, we've got some corruption in temp tables, particularly considering the temp tables shouldn't even be there in the first place. No corruption has been seen elsewhere, but obviously we won't really know until we've done a full dump/restore. A few ?s... Is there a clean way to clear those temp namespaces out? Any ideas on what it is that postgres might be waiting or checking for (that may now be damaged) in an attempt to create a table/view/index that could cause memory usage to skyrocket (not necessarily cpu)? Any tests I could run to gain more information? Is it possible that something about bad temp table namespaces in pg_namespace that could cause a create query to hang - perhaps as it attempts to check that no class of the same name already exists in that session? Stats about the system: Postgres 8.1.4 db size: 200+ GB Inheritance is used extremely heavily, so in figuring out what could cause a create to hang, it may be of interest to know that there are: 101,745 tables 314,821 indexes 1,569 views The last averages taken on the number of writes per hour on this database: ~3 million (this stat is a few weeks old) Machine info: OS: Solaris 10 Sunfire X4100 XL 2x AMD Opteron Model 275 dual core procs 8GB of ram Thanks in advance for any thoughts, |