Earlier today we experienced some problems with one of our PG installations - running 8.0.3. It started with the DB's write performance being fairly slow (this is how we noticed it), and after some research, I was seeeing severeal of the backend processes growing in their memory usage, to someplace around 4-6GB RSS. (Machine has 8GB +1GB swap). So then they would swap-thrash until the kernel killed off a process, at which point I'd be able to issue a pg_ctl shutdown. Looking in the logs after we got the machine back to where it's responsive, I saw the following errors in the log (these are all from today): ERROR: relation with OID 97737136 does not exist CONTEXT: SQL statement "INSERT INTO _netadmin.sl_log_1 (log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata) VALUES (1, $1, $2, nextval('_netadmin.sl_action_seq'), $3, $4);" ERROR: xlog flush request 33/553D66E0 is not satisfied --- flushed only to 32/FDECF4D8 CONTEXT: writing block 4945 of relation 1663/17230/96228095 ERROR: xlog flush request 33/553D66E0 is not satisfied --- flushed only to 32/FDECF4D8 CONTEXT: writing block 4945 of relation 1663/17230/96228095 WARNING: could not write block 4945 of 1663/17230/96228095 DETAIL: Multiple failures --- write error may be permanent. .. these occur several times - the first one seems to occur ever since we enabled slony-1 on some replication sets on the server. (_netadmin.sl* is slony stuff). The latter error, I'm not sure what would cause it. At one point the following errors show up: ERROR: could not open segment 1 of relation 1663/17230/96242110 (target block 61997056): No such file or directory ERROR: could not open segment 1 of relation 1663/17230/96242110 (target block 61997056): No such file or directory ERROR: could not open segment 1 of relation 1663/17230/96242110 (target block 775304242): No such file or directory ERROR: could not open segment 1 of relation 1663/17230/96242110 (target block 1680881205): No such file or directory ERROR: could not open segment 1 of relation 1663/17230/96242110 (target block 1680881205): No such file or directory .. several more lines, with different target block numbers At one poin, when trying to run a vacuum on one of the tables, we got the following errors: 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 4947 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 4948 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 4949 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 4951 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 4952 is uninitialized --- fixing ... keeps going .... 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 11959 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 11992 is uninitialized --- fixing 2006-01-20 13:06:01 CST [local] WARNING: relation "inv_node" page 12118 is uninitialized --- fixing 2006-01-20 13:06:04 CST [local] ERROR: failed to re-find parent key in "inv_node_node_mac_key" (inv_node_node_mac_key is the primary index on the inv_node table.) When looking closer at the table (and some other tables), we found that despite having UNIQUE indices on the tables, several of them had duplicate keys for the index field. We are currently in the process of cleaning up after the mess, but since this is a production system, we want to try to find out what happened. Several people online had mentioned either being out of disk space, or drive problems - the DB is on a 300GB partition, using barely 10GB of disk space - and the server doesn't show any indications of there being hardware problems... I can provide you with the full log (616K, ~13k lines) upon request. - d. -- Dominic J. Eidson "Baruk Khazad! Khazad ai-menu!" - Gimli ------------------------------------------------------------------------------- http://www.the-infinite.org/