Create table/view/index hanging following server crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Admin List!

I've had some adventures this morning with our db following an overheat of the machine the night before (gratis of our hosting facility's cooling problem). The system came back up following a box restart, and postgres seemed to recover well. However, attempting to create a table, temp table, index or view will hang (I've confirmed that no conflicting locks exist in pg_locks) and you'll be able to see the process's memory usage climb to around 4000M of memory and then fail with an "out of memory" error. A truss on the process (we are on Solaris, if you are used to linux, think of it as strace) wasn't terribly helpful (at least by the time we can get to it) - a long string of "brk(0xDF259DC8)                                 = 0" while it climbs to 4000M and then errors out.  Non-creates seem fine, we've tried large/small selects/inserts/updates, drop tables, alter tables. All seem as normal.

In looking around the catalogs to try and see any more clues, I took a look at pg_namespace. There were temp table namespaces listed there, however there should be no temp tables on the system right now at all. Here are some findings:

db=# select count(*) from pg_namespace where nspname ilike 'pg_temp%';
count
-------
  242
(1 row)

db=# select c.relname, c.relnamespace, n.nspname from  pg_class c
db-# join pg_namespace n on c.relnamespace=n.oid
db-# where n.nspname ~'pg_temp' order by n.nspname;
       relname        | relnamespace |  nspname
-----------------------+--------------+------------
trigger_signup_mem_id |    118538615 | pg_temp_49
trigger_signup        |    118538615 | pg_temp_49
(2 rows)

db=# select * from pg_temp_49.trigger_signup;
ERROR:  invalid page header in block 40 of relation "trigger_signup"
db=#

Obviously, we've got some corruption in temp tables, particularly considering the temp tables shouldn't even be there in the first place. No corruption has been seen elsewhere, but obviously we won't really know until we've done a full dump/restore.

A few ?s...
Is there a clean way to clear those temp namespaces out?
Any ideas on what it is that postgres might be waiting or checking for (that may now be damaged) in an attempt to create a table/view/index that could cause memory usage to skyrocket (not necessarily cpu)? Any tests I could run to gain more information? Is it possible that something about bad temp table namespaces in pg_namespace that could cause a create query to hang - perhaps as it attempts to check that no class of the same name already exists in that session?


Stats about the system:
Postgres 8.1.4
db size: 200+ GB
Inheritance is used extremely heavily, so in figuring out what could cause a create to hang, it may be of interest to know that there are:
101,745 tables
314,821 indexes
1,569 views
The last averages taken on the number of writes per hour on this database: ~3 million (this stat is a few weeks old)

Machine info:
OS: Solaris 10
Sunfire X4100 XL
2x AMD Opteron Model 275 dual core procs
8GB of ram

Thanks in advance for any thoughts,
Kim Hatcher

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux