Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Thu, 28 Aug 2008 16:42:47 -0600

On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling <matthew@xxxxxxxxxxx> wrote:

> Another point is that from a business perspective, a database that has
> stopped responding is equally bad regardless of whether that is because the
> OOM killer has appeared or because the machine is thrashing. In both cases,
> there is a maximum throughput that the machine can handle, and if requests
> appear quicker than that the system will collapse, especially if the
> requests start timing out and being retried.

But there's a HUGE difference between a machine that has bogged down
under load so badly that you have to reset it and a machine that's had
the postmaster slaughtered by the OOM killer.  In the first situation,
while the machine is unresponsive, it should come right back up with a
coherent database after the restart.

OTOH, a machine with a dead postmaster is far more likely to have a
corrupted database when it gets restarted.

> Likewise, I would be all for Postgres managing its memory better. It would
> be very nice to be able to set a maximum amount of work-memory, rather than
> a maximum amount per backend. Each backend could then make do with however
> much is left of the work-memory pool when it actually executes queries. As
> it is, the server admin has no idea how many multiples of work-mem are going
> to be actually used, even knowing the maximum number of backends.

Agreed.  It would be useful to have a cap on all work_mem, but it
might be an issue that causes all the backends to talk to each other,
which can be really slow if you're running a thousand or so
connections.