On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling <matthew@xxxxxxxxxxx> wrote: > Another point is that from a business perspective, a database that has > stopped responding is equally bad regardless of whether that is because the > OOM killer has appeared or because the machine is thrashing. In both cases, > there is a maximum throughput that the machine can handle, and if requests > appear quicker than that the system will collapse, especially if the > requests start timing out and being retried. But there's a HUGE difference between a machine that has bogged down under load so badly that you have to reset it and a machine that's had the postmaster slaughtered by the OOM killer. In the first situation, while the machine is unresponsive, it should come right back up with a coherent database after the restart. OTOH, a machine with a dead postmaster is far more likely to have a corrupted database when it gets restarted. > Likewise, I would be all for Postgres managing its memory better. It would > be very nice to be able to set a maximum amount of work-memory, rather than > a maximum amount per backend. Each backend could then make do with however > much is left of the work-memory pool when it actually executes queries. As > it is, the server admin has no idea how many multiples of work-mem are going > to be actually used, even knowing the maximum number of backends. Agreed. It would be useful to have a cap on all work_mem, but it might be an issue that causes all the backends to talk to each other, which can be really slow if you're running a thousand or so connections.