Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

henk de wit <henk53602@xxxxxxxxxxx> · Tue, 26 Aug 2008 22:53:40 +0200

> What do your various logs (pgsql, application, etc...) have to say?

There
is hardly anything helpful in the pgsql log. The application log
doesn't mention anything either. We log a great deal of information in
our application, but there's nothing out of the ordinary there,
although there's of course always a chance that somewhere we missed
something. 

> Can you read a java stack trace?  Sometimes slogging through them will
> reveal some useful information.

I can read a java stack trace very well, I'm primarily a Java developer. The stack trace is the following one:

org.postgresql.util.PSQLException:
An I/O error occured while sending to the backend.

    at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:218)

    at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)

    at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350)

    at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:304)

    at
org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)

    at
org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)

Our
code simply executes the same statement.executeUpdate() that is does
about 500.000 times during a business day. As soon as this select query
is hitting this 22 GB table then there's a chance that 'suddenly' all
these utterly simply insert queries start failing. The insert query is
nothing special either. It's just an "INSERT INTO ... VALUES (...)"
type of thing. The select query can actually be a few different kinds
of queries, but basically the common thing between them is reading from
this 22 GB table. In fact, our system administrator just told me that
even the DB backup is able to trigger this behaviour. As soon as the
backup process is reading from this 22 GB table, the inserts on it
-may- start to fail.

> Sounds to me like your connections are timing out (what's your timeout
> in jdbc set to?)

There's
no explicit timeout being set. Queries can theoretically execute for
hours. In some rare cases, some queries indeed run for that long.

> A likely cause is that you're getting big checkpoint spikes.  What
> does vmstat 10 say during these spikes?

It's
hard to reproduce the problem. We're trying to simulate it in on our
testing servers but haven't been successfull yet. The problem typically
lasts for only a minute a time on the production server and there's no
saying on when it will occur again. Of course we could try to enfore it
by running this select query continously, but for a production server
it's not an easy decission to actually do that. So therefore basically
all we were able to do now is investigate the logs afterwards. I'll try
to run vmstat though when the problem happens when I'm at the console. 

> If you're running the
> sysstate service with data collection then sar can tell you a lot.

Ok, I'm not a big PG expert so I'll have to look into what that means exactly ;) Thanks for the advice though.

Kind regards,
Henk
Express yourself instantly with MSN Messenger! MSN Messenger