Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Tue, 26 Aug 2008 12:23:30 -0600

On Tue, Aug 26, 2008 at 10:44 AM, henk de wit <henk53602@xxxxxxxxxxx> wrote:
>
> Hi,
>
> We're currently having a problem with queries on a medium sized table. This table is 22GB in size (via select pg_size_pretty(pg_relation_size('table'));). It has 7 indexes, which bring the total size of the table to 35 GB (measured with pg_total_relation_size).
>
> On this table we're inserting records with a relatively low frequency of +- 6~10 per second. We're using PG 8.3.1 on a machine with two dual core 2.4Ghz XEON CPUs, 16 GB of memory and Debian Linux. The machine is completely devoted to PG, nothing else runs on the box.
>
> Lately we're getting a lot of exceptions from the Java process that does these inserts: "An I/O error occured while sending to the backend." No other information is provided with this exception (besides the stack trace of course).

What do your various logs (pgsql, application, etc...) have to say?
Can you read a java stack trace?  Sometimes slogging through them will
reveal some useful information.

> The pattern is that for about a minute, almost every insert to this 22 GB table results in this exception. After this minute everything is suddenly fine and PG happily accepts all inserts again. We tried to nail the problem down, and it seems that every time this happens, a select query on this same table is in progress. This select query starts right before the insert problems begin and most often right after this select query finishes executing, inserts are fine again. Sometimes though inserts only fail in the middle of the execution of this select query. E.g. if the select query starts at 12:00 and ends at 12:03, inserts fail from 12:01 to 12:02.

Sounds to me like your connections are timing out (what's your timeout
in jdbc set to?)

A likely cause is that you're getting big checkpoint spikes.  What
does vmstat 10 say during these spikes?  If you're running the
sysstate service with data collection then sar can tell you a lot.

If it is a checkpoint issue then you need more aggresive bgwriter
settings, and possibly more bandwidth on your storage array.

Note that you can force a checkpoint from a superuser account at the
command line.  You can always force one and see what happens to
performance during it.  You'll need to wait a few minutes or so
between runs to see an effect.