On Tue, Aug 26, 2008 at 10:44 AM, henk de wit <henk53602@xxxxxxxxxxx> wrote: > > Hi, > > We're currently having a problem with queries on a medium sized table. This table is 22GB in size (via select pg_size_pretty(pg_relation_size('table'));). It has 7 indexes, which bring the total size of the table to 35 GB (measured with pg_total_relation_size). > > On this table we're inserting records with a relatively low frequency of +- 6~10 per second. We're using PG 8.3.1 on a machine with two dual core 2.4Ghz XEON CPUs, 16 GB of memory and Debian Linux. The machine is completely devoted to PG, nothing else runs on the box. > > Lately we're getting a lot of exceptions from the Java process that does these inserts: "An I/O error occured while sending to the backend." No other information is provided with this exception (besides the stack trace of course). What do your various logs (pgsql, application, etc...) have to say? Can you read a java stack trace? Sometimes slogging through them will reveal some useful information. > The pattern is that for about a minute, almost every insert to this 22 GB table results in this exception. After this minute everything is suddenly fine and PG happily accepts all inserts again. We tried to nail the problem down, and it seems that every time this happens, a select query on this same table is in progress. This select query starts right before the insert problems begin and most often right after this select query finishes executing, inserts are fine again. Sometimes though inserts only fail in the middle of the execution of this select query. E.g. if the select query starts at 12:00 and ends at 12:03, inserts fail from 12:01 to 12:02. Sounds to me like your connections are timing out (what's your timeout in jdbc set to?) A likely cause is that you're getting big checkpoint spikes. What does vmstat 10 say during these spikes? If you're running the sysstate service with data collection then sar can tell you a lot. If it is a checkpoint issue then you need more aggresive bgwriter settings, and possibly more bandwidth on your storage array. Note that you can force a checkpoint from a superuser account at the command line. You can always force one and see what happens to performance during it. You'll need to wait a few minutes or so between runs to see an effect.