> Well, I've had time to read your previous message too. > > The first time you seem to imply the machine slowed down across all > processes - ssh etc. Was that the case this time? OK, the slowdown/crash was a different problem, which might have been caused by a "too many files open" problem combined with a huge load. There was no evidence that the server actually crashed, but we needed to reboot it because it became completely non-responsive. My current problem is this: http://archives.postgresql.org/pgsql-general/2005-11/msg00828.php This current situation is different, the machine was almost idle, and there was 1 UPDATE on a table running, a lot of inserts on the same table waiting for a lock held by the transaction of the UPDATE, and a VACUUM (started by autovacuum) running on the same table. There was no dead-lock detected by the system, and I guess it should have detected it if it was one, cause this whole thing was running for more than 3 hours according to pg_stat_activity, and I set the deadlock timeout to 2 seconds. So this happened 2 days ago, and today it happened again with exactly the same picture: update, inserts, (auto)vacuum, on exactly the same table as 2 days ago. That can't be a coincidence, there is something bad either with our code or postgres. I do suspect that our code is doing something not completely kosher, but it still should not lock in this way. Now according to pg_locks, the transaction of the update was holding an exclusive lock on the table, which I can't explain, as we don't acquire any table lock in our whole application, and the SQL of the update is coming from our application, I could identify the process it's part of. The only explicit locking we do in that process is via a SELECT ... FOR UPDATE on that table, but that should place no exclusive lock on the table, right ? So where is that lock coming from ? And then, why was I not able to cancel the backend via kill or pg_cancel_backend ? > When you say "locked" do you mean it was waiting on locks, was using all > the CPU, unresponsive or just taking the query a long time? > > To prepare for next time I'd: > 1. Leave ssh logged-in, run screen to get three sessions > 2. Leave "top" running in the first - that'll show you process > activity/general load > 3. Run "vmstat 10" in the second - that'll show you overall > memory/swap/disk/cpu usage. > 4. The third session is then free to work in, if neither of the first > two show anything useful. In that particular case I had a few terminals open, one of them actually running top and others free. They were not responding either, so the machine was completely bogged down. I do routinely have a few terminals open, a psql one included. But that kind of lock-down did not happen anymore, and we changed server in the meantime. Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org