Re: Major Performance decrease after some hours

"Peter Bauer" <peter.m.bauer@xxxxxxxxx> · Sun, 1 Oct 2006 20:25:58 +0200

2006/10/1, MaXX <bs139412@xxxxxxxxx>:
Peter Bauer wrote:
> 2006/10/1, MaXX <bs139412@xxxxxxxxx>:
>> Peter Bauer wrote:
>> [...]
>> > There are 10-15 postmaster processes running which use all the CPU
>> power.
>> > A restart of tomcat and then postgresql results in the same situation.
>> > Some postgres processes are in DELETE waiting or SELECT waiting.
>> > VACUUM runs through in just about 1-2 seconds and is run via cron
>> > every minute and reports that the fsm setting are high enough.
>>
>> Pure speculation: are you sure you aren't vacuuming too agressively?
>
> The previous configuration was to vacuum all 10 minutes. This resulted
> in 60-70% system load (harddisk i suppose) so i thought more vacuums
> would be a good idea.
It won't hurt to ensure that you will never have 2 vacuum processes
running at the same time...
>> The DELETE waiting and SELECT waiting sound to me like they are waiting
>> for a lock that another vacuum is holding. You've said that you spawn a
>> vacumm process by cron every minute, that may be fine when the server
>> isn't too loaded. But when vacuums begins to take longer cron will spawn
>> more and more vacuums killing your machine.
> When the postmaster processes eat up all CPU cycles the vacuum still
> does only take some seconds and there are no postgres VACUUM processes
> hanging around, so i don't think they hamper each other.
> I have the impression that the database "condition" is getting worse
> over time so the queries take longer and longer and at some point the
> I/O Exceptions start because the data cannot be written to the disk in
> time. I just don't know how to pinpoint this bad "condition" or
> whatever gets worse over time.

you can connect to your server with pgAdmin, go to "tools -> server
status" to identify wich process is causing troubles on a sunday ;-)
(you may have to turn on some logging option on postresql.conf) I find
it more friendly than psql but it's a matter of taste.
You are running pg on a Linux platform, try systat and iostat. (I'm
running FreeBSD, I suppose those 2 are availlable on Linux)

Long running transactions are not you friends too... I once made a typo
in a script which prevented commits from appenning... not good...

This was the first thing we checked when these problems appeared. The
only locking is done in functions which should not be a problem. Of
course it is possible that there are bugs which prevent some
transactions from being commited, but its a pretty big application so
i would need some advice for finding and debugging such problems.
Is it possible to check which transactions are currently running,
which thread or java process runs them and for how long?
I will attach a logfile to the reply to Toms mail which contains all
sql statements executed from the start of the loadtest to the point
the problems began, so maybe you can find something in it.

thx,
Peter