Re: How to simulate crashes of PostgreSQL?

Alvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> · Thu, 27 Aug 2009 23:13:03 -0400

Vick Khera wrote:
> On Tue, Aug 25, 2009 at 4:55 PM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote:
> > I've always thought that the fd.c layer is more about not having to
> > configure the code explicitly for max-files-per-process limits.  Once
> > you get into ENFILE conditions, even if Postgres manages to stay up,
> > everything else on the box is going to start falling over.  So the
> > sysadmin is likely to have to resort to a reboot anyway.
> 
> In my case, all sorts of processes were complaining about being unable
> to open files.  Once Pg panicked and closed all its files, everything
> came back to normal.  I didn't have to reboot because most everything
> was written to retry and/or restart itself, and nothing critical like
> sshd croaked.

Hmm.  How many DB connections were there at the time?  Are they normally
long-lived?

I'm wondering if the problem could be caused by too many backends
holding the maximum of open files each.  In my system,
/proc/sys/fs/file-max says ~200k, and per-process limit is 1024, so it
would take about 200 backends with all FDs in use to bring the system to
a near collapse that won't be solved until Postgres is restarted.  This
doesn't sound so far-fetched if the connections are long lived, perhaps
from a pooler.

Maybe we should have another inter-backend signal: when a process gets
ENFILE, signal all other backends and they close a bunch of files each.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general