On Sep 27, 2006, at 7:28 AM, Rafal Pietrak wrote:
Hi,
I fell into the following problem (unfortunately, the database
contents
has sensitive customer information, so can publish very little of
that).
Currently postgress process takes close to 100% CPU time.
I've restarted the process a moment ago, and it was calm for a brief
minute.
It started to consume the CPU once I've issued:
ALTER GROUP majster DROP USER rafal;
... which took close to 5minutes to complete. Then I've issued:
Sounds like it was blocked (unsure by what). You can use pg_locks to
check that.
VACUUM ANALYZE ;
... which I had to cancel when it didnt' completed within another
10minutes. CPU utilisation remains at 100% even after the VACCUM was
canceled.
VACUUM sometimes hangs around for a while after it is cancelled,
which can be annoying. I think this has been improved for 8.2 FWIW
What was consuming all the CPU? I assume from what you wrote
previously that it is a postgres backend. If so, what was it doing?
You can use ps, the pg_stat_activity view and strace to figure that out.
pg_dumpall was successfull during the large-CPU-usage time.
...another postgress-process restart, and another VACCUM ANALYSE, this
one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And
everything
seams working.
Was something else using the database previously? Was something else
different this time than the last?
the database is currently almost empty: c.a. 100k records within
c.a. 20
tables with another 30 or so views. There are c.a. 6k userids (roles)
created in the cluster.
How big is the data on disk? Is it possible that you have queries
scanning tables containing lots of dead tuples? If so a VACUUM FULL
would help, but that's totally speculative.
Any ideas where should I look (system tables? process stats??) to
dieagnose the problem? .... just to be prepared when it appears again?
Postgres itself should not spontaneously start using all of the cpu.
Some query operations must be running unless there is an underlying
problem (hardware, os, fs). When you saw this happen did you happen
to check if it was user or system cpu usage? I had a corrupt
filesystem once that cause pg to go off into the weeds consuming all
cpu forever, but it was all *system* cpu time. Also I couldn't shut
pg down because it was presumably caught in an eternal system call
unable to respond to signals. Your description above sounds different
than this to me, however.
-Casey