Hi all I'm trying to track down the causes of an application crash and reviewing PG logs I'm seeing this: 2011-04-22 06:00:16 CEST LOG: checkpoint complete: wrote 140 buffers (3.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=27.937 s, sync=1.860 s, total=29.906 s 2011-04-22 06:04:47 CEST LOG: checkpoint starting: time 2011-04-22 06:05:20 CEST LOG: checkpoint complete: wrote 161 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=32.188 s, sync=0.890 s, total=33.157 s 2011-04-22 06:09:20 CEST ERROR: update or delete on table "automezzo" violates foreign key constraint "fk77e102615cffe609" on table "carico" 2011-04-22 06:09:20 CEST DETAIL: Key (id)=(13237) is still referenced from table "carico". 2011-04-22 06:09:20 CEST STATEMENT: delete from Automezzo where id=$1 2011-04-22 06:09:20 CEST ERROR: current transaction is aborted, commands ignored until end of transaction block 2011-04-22 06:09:20 CEST STATEMENT: SELECT key from CONFIGURAZIONE LIMIT 0 2011-04-22 06:09:47 CEST LOG: checkpoint starting: time 2011-04-22 06:10:15 CEST LOG: checkpoint complete: wrote 137 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=27.406 s, sync=0.781 s, total=28.360 s 2011-04-22 06:14:47 CEST LOG: checkpoint starting: time 2011-04-22 06:15:14 CEST LOG: checkpoint complete: wrote 136 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=27.078 s, sync=0.797 s, total=27.984 s 2011-04-22 06:19:47 CEST LOG: checkpoint starting: time 2011-04-22 06:20:20 CEST LOG: checkpoint complete: wrote 164 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=32.938 s, sync=0.594 s, total=33.625 s 2011-04-22 06:24:47 CEST LOG: checkpoint starting: time 2011-04-22 06:25:20 CEST LOG: checkpoint complete: wrote 160 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=31.875 s, sync=0.953 s, total=32.922 s 2011-04-22 06:29:47 CEST LOG: checkpoint starting: time 2011-04-22 06:30:14 CEST LOG: checkpoint complete: wrote 130 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=25.922 s, sync=0.937 s, total=26.937 s 2011-04-22 06:34:47 CEST LOG: checkpoint starting: time 2011-04-22 06:35:15 CEST LOG: checkpoint complete: wrote 132 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=26.578 s, sync=1.250 s, total=27.984 s 2011-04-22 06:39:47 CEST LOG: checkpoint starting: time 2011-04-22 06:40:15 CEST LOG: checkpoint complete: wrote 136 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=27.250 s, sync=0.984 s, total=28.453 s 2011-04-22 06:44:47 CEST LOG: checkpoint starting: time 2011-04-22 06:51:41 CEST LOG: checkpoint complete: wrote 108 buffers (2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=409.007 s, sync=4.672 s, total=414.070 s 2011-04-22 06:55:42 CEST LOG: could not receive data from client: No connection could be made because the target machine actively refused it. <goes on like the above until application restart> Now, I'm fairly new with PG and checkpoint internals but even after reading around (PG docs, http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm and various other posts from this ml) I haven't been able to figure out the following: -given the configuration attached (which is basically a vanilla one) and the number of buffers written at each execution, are these execution times normal or above average? -in the case of the execution that overruns past the timeout, what are the implications wrt the client application? -AFAIU client connections are basically stalled during checkpoints. Is it reasonable to infer that the fact that the application blocking on a getConnection() might be related to checkpoints being executed? -considering some tuning on the PG side, should I try increasing checkpoint_timeout and rising checkpoint_completion_target to lessen the impact of IO on the client or should I shorten the period so there's less stuff to write? from the number of buffers written on average I'd assume the first option is the one to go for but I might miss some bit of reasoning here... The server both PG and the application server (Tomcat) are running on is a Stratus ft2300 machine, which I think is setup to do RAID1. I've read about RAID5 not being a wise setup for disks hosting PG, what about RAID1? Postgres version is 8.3 http://postgresql.1045698.n5.nabble.com/file/n4332601/postgresql.conf postgresql.conf Any help will be greatly appreciated:) Thanks, and sorry for the long post. cheers Francesco -- View this message in context: http://postgresql.1045698.n5.nabble.com/Checkpoint-execution-overrun-impact-tp4332601p4332601.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance