Ups, missed the list recepient itself. Thenx Duncan for the analysis. This happend again, so I'm able to peek at the details you've pointed out. On Wed, 2006-09-27 at 09:33 -0700, Casey Duncan wrote: > Sounds like it was blocked (unsure by what). You can use pg_locks to > check that. That view currently contais 240 entries .... source of blocking. Now I have to dig the ultimate reason why those locks end up there (and are not being cleand). Any ideas? Whould those locks go away by themselves .... in time? > What was consuming all the CPU? I assume from what you wrote > previously that it is a postgres backend. If so, what was it doing? ----------------ps output (just one backend)------------ postgres 4759 0.0 2.0 19368 5292 ? S 21:02 0:00 postgres: czyn_738773 czyn 127.0.0.1(55431) UPDATE waiting ------------------------------------------------------- "czyn" is the name of the database, "czyn_738773" is database username. IP address is localhost, since the database purpose is web serbvice back-end. Still, I'm not sure what that UPDATE counld that be. > You can use ps, the pg_stat_activity view and strace to figure that out. ------------- exerpt from pg_stat_activity----------------- 17660 | czyn | 4759 | 19541 | czyn_738773 | <command string not enabled> | | 2006-09-27 21:02:05.914728+02 | 127.0.0.1 | 55431 (18 rows) ------------------------------------------ But this is not very meaningfull to me. > > ...another postgress-process restart, and another VACCUM ANALYSE, this > > one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And > > everything > > seams working. > > Was something else using the database previously? Was something else > different this time than the last? No. My impression was (naive, I know), that not all things were correctly cleand-up on the first restart. Now I think, that on the second go I was just a little bit more lucky - www service was calm for a little longer. That was anough for VACUUM to complete. Still, throuble started later. > How big is the data on disk? Is it possible that you have queries close to nothing: c.a. 200MiB > problem (hardware, os, fs). When you saw this happen did you happen > to check if it was user or system cpu usage? I had a corrupt It's the user CPU time. As it currently looks like some unknown UPDATE causes the trouble, how can I identify what it is? 1. the database is not very sophisticated, apart from what I've mentioned earlier, it does have just a few stored procedures in pgsql - nothing fancy. And there is no triggers just specialized views (although quite a lot of referencial integrity constrains). 2. machine is 'from the old days': 450MHz CPU 3. the system is currently under 'available-for-public' condition, so there may be 'waves' (peeks) of activity. 4. when the system was tested in-lab (small number of users - no risk / not-tested for havier load behavior), I haven't noticed this 'blocking'. Any ideas where should I look now, to pinpoint the problem? -- -R