=?ISO-8859-1?Q?Pablo_Delgado_D=EDaz=2DPache?= <delgadop@xxxxxxxxx> writes: > I'm having a strange problem with postgres & autovacuum > Everything is working fine until I start getting the following errors ... > and postgres stops working shortly after (it stops accepting connections) > 2010-11-13 12:34:08.599 CET|1|||7104||4cde77b0.1bc0|2010-11-13 12:34:08 > CET|1/44303|0|| LOG: automatic vacuum of table > "mrs.pg_catalog.pg_statistic": index scans: 1 > pages: 0 removed, 189 remain > tuples: 132 removed, 4587 remain > system usage: CPU 0.00s/0.00u sec elapsed 0.04 sec > 2010-11-13 13:24:40.998 CET|2|||3300||4cdc2ae6.ce4|2010-11-11 18:41:58 > CET||0|| WARNING: worker took too long to start; cancelled > 2010-11-13 13:25:41.126 CET|3|||3300||4cdc2ae6.ce4|2010-11-11 18:41:58 > CET||0|| WARNING: worker took too long to start; cancelled > 2010-11-13 13:26:41.254 CET|4|||3300||4cdc2ae6.ce4|2010-11-11 18:41:58 > CET||0|| WARNING: worker took too long to start; cancelled Hm. The code comment above that warning says * The only problems that may cause this code to * fire are errors in the earlier sections of AutoVacWorkerMain, * before the worker removes the WorkerInfo from the * startingWorker pointer. but it's hard to see what problem there could lead to an issue. (In particular, I discount the idea that AutovacuumLock could be stuck, because we had to acquire it in order to issue this message.) But it strikes me that the code comment is wrong in one significant way: if the postmaster were failing to heed SIGUSR1 at all, you could reach the timeout here, because the fork-failed signal wouldn't get sent. Given that you say it also stops accepting connections, I'm thinking this is a postmaster problem not an autovacuum problem. But you've not provided any information about that end of it. Exactly what happens when you try to make a connection? Are there any entries at all in the postmaster log? What about the kernel log? Are you sure that new connections stop working *after* this happens, and not at exactly the same time? > OS: Centos 5.5 > Kernel: 2.6.18-194.26.1.el5 > Postgres version: 8.4.5 (installation out-of-the-box using yum) Given that this is a Linux system, I think that an OOM kill on the postmaster is a not-insignificant possibility. Or at least I would think that if there weren't a PostmasterIsAlive check in the autovac launcher loop. It's real hard to see how you could get more than one of these messages if the postmaster were gone entirely. Could you try strace'ing the postmaster process to see what it's doing when this is happening? regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin