Re: too-may-open-files log file entries when vauuming under solaris

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 05 Mar 2014 15:16:50 -0500

"Raschick, Hartmut" <Hartmut.Raschick@xxxxxxxxxxx> writes:
> recently we have seen a lot of occurrences of "out of file descriptors:
> Too many open files; release and retry" in our postgres log files, every
> night when a "vacuum full analyze" is run.  After some digging into the
> code we found that postgres potentially tries to open as many as a
> pre-determined maximum number of file descriptors when vacuuming. That
> number is the lesser of the one from the configuration file
> (max_files_per_process) and the one determined at start-up by
> "src/backend/storage/file/fd.c::count_usable_fds()". Under Solaris now,
> it would seem, finding out that number via dup(0) is not sufficient, as
> the actual number of interest might be/is the number of usable stream
> file descriptors (up until Solaris 10, at least). Also, closing the last
> recently used file descriptor might therefore not solve a temporary
> problem (as something below 256 is needed). Now, this can be fixed by
> setting/leaving the descriptor limit at 256 or changing the
> postgresql.conf setting accordingly. Still, the function for determining
> the max number is not working as intended under Solaris, it would
> appear. One might try using fopen() instead of dup() or have a different
> handling for stream and normal file descriptors (including moving
> standard file descriptors to above 255 to leave room for stream
> ones). Maybe though, all this is not worth the effort; then it might
> perhaps be a good idea to mention the limitations/specialties in the
> platform specific notes (e.g. have u/limit at 256 maximum).

TBH this sounds like unfounded speculation.  AFAIK a Postgres backend will
not open anything but regular files after its initial startup.  I'm not
sure what a "stream" is on Solaris, but guessing that it refers to pipes
or sockets, I don't think we have a problem with an OS restriction that
those be below FD 256.  In any case, if we did, it would presumably show
up as errors not release-and-retry events.

Our usual experience is that you get release-and-retry log messages when
the OS is up against the system-wide open-file limit rather than the
per-process limit (ie, the underlying error code is ENFILE not EMFILE).
I don't know exactly how Solaris strerror() spells those codes so it's
difficult to tell from your reported log message which case is happening.
If it is the system-wide limit that's at issue, then of course the dup(0)
loop isn't likely to find it, and adjusting max_files_per_process (or
maybe better, reducing max_connections) is the expected solution.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general