Re: What is the best way to handle "too many open files" errors?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It was thus said that the Great Andr Warnier once stated:
> 
> Another thing : it looks from your lsof list, that you are using the 
> Apache "prefork" model.
> I don't remember precisely your configuration or the kind of load or 
> processes you are running, but you might try the "worker" (threaded) 
> model instead.  I am not entirely sure, but I believe that this will 
> also reduce the total number of files opened on your system, as threads 
> may share a lot more in that respect, than individual child processes do.

  Switching to the threaded model won't help if you are running out of open
files---the open file limit is *per process* (more on that below).  And yes,
a child process will inherit all open files from the parent process, but
once the child process is created, it can close any open files without
affecting the parent (and by the same token, the parent can close files
without affecting any existing children).  

> But I would wait a few hours for a real expert to comment, which I'm 
> sure one will do if I wrote something really stupid above.

  First off, "ulmit -n" will report back the number of open files a
*PROCESS* can have open---this isn't system wide, but per-proess.  It's an
important distinction.  

  Second, lsof does report all files used by a process, but that isn't the
whole story.  For instance, lsof (and one instance of apache from my
development server):

httpd     17155  apache  cwd       DIR      253,0     4096          2 /
httpd     17155  apache  rtd       DIR      253,0     4096          2 /
httpd     17155  apache  txt       REG      253,0   259488    5300352 /usr/sbin/httpd
httpd     17155  apache  mem       REG      253,0    50748   18187232 /lib/tls/librt-2.3.4.so
httpd     17155  apache  mem       REG      253,0    28544   18187230 /lib/libcrypt-2.3.4.so
httpd     17155  apache  mem       REG      253,0  1525004   18187219 /lib/tls/libc-2.3.4.so
httpd     17155  apache  mem       REG      253,0    81184   18187224 /lib/libresolv-2.3.4.so
httpd     17155  apache  mem       REG      253,0   213600   18187228 /lib/libssl.so.0.9.7a
httpd     17155  apache  mem       REG      253,0     7004   18186360 /lib/libcom_err.so.2.1
httpd     17155  apache  mem       REG      253,0    82320    5303493 /usr/lib/libsasl2.so.2.0.19

... a whole mess of output deleted

httpd     17155  apache  mem       REG      253,0  1261824    5345160 /usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so
httpd     17155  apache  mem       REG      253,0   804084    6424738 /usr/lib/libstdc++.so.6.0.3
httpd     17155  apache  DEL       REG        0,6              422353 /dev/zero
httpd     17155  apache  DEL       REG        0,6               37541 /dev/zero
httpd     17155  apache  DEL       REG        0,6              422347 /dev/zero
httpd     17155  apache    0r      CHR        1,3                2029 /dev/null
httpd     17155  apache    1w      CHR        1,3                2029 /dev/null
httpd     17155  apache    2w      REG      253,0     8426    2360724 /var/log/httpd/error_log.1
httpd     17155  apache    3u     IPv6      37481                 TCP *:http (LISTEN)
httpd     17155  apache    4u     IPv6      37483                 TCP *:https (LISTEN)
httpd     17155  apache    5r     FIFO        0,7              422346 pipe
httpd     17155  apache    6w     FIFO        0,7              422346 pipe
httpd     17155  apache    7w      REG      253,0     8426    2360724 /var/log/httpd/error_log.1
httpd     17155  apache    8w      REG      253,0   262156    2360561 /var/log/httpd/access_log.1
httpd     17155  apache    9w      REG      253,0    15126   11783673 /etc/httpd/www.roswell.area51
httpd     17155  apache   10w      REG      253,0   151095   11782872 /etc/httpd/wwwtest.roswell.area51
httpd     17155  apache   11w      REG      253,0     3046   11781743 /etc/httpd/s-secure.roswell.area51
httpd     17155  apache   12w      REG      253,0     4263   11783866 /etc/httpd/secure.roswell.area51

It's 146 lines, but it's not 146 "open" files.  Since Linux can page
directly from executables and libraries, technically, they're "open" in the
sense that yes, the kernel is reading from them as the program runs, but
they're not counted against the "ulimit -n" limit (unless I'm terribly
mistaken, and I could be, but bear with me for a bit).  You can see such
memory mappings if you check the maps file in the appropriate /proc'
directory:

[root@lucy 17155]# pwd
/proc/17155
root@lucy 17155]# more maps 
00111000-00119000 r-xp 00000000 fd:00 18187232   /lib/tls/librt-2.3.4.so
00119000-0011a000 r--p 00007000 fd:00 18187232   /lib/tls/librt-2.3.4.so
0011a000-0011b000 rw-p 00008000 fd:00 18187232   /lib/tls/librt-2.3.4.so
0011b000-00125000 rw-p 0011b000 00:00 0 
00125000-0012a000 r-xp 00000000 fd:00 18187230   /lib/libcrypt-2.3.4.so
0012a000-0012b000 r--p 00004000 fd:00 18187230   /lib/libcrypt-2.3.4.so
0012b000-0012c000 rw-p 00005000 fd:00 18187230   /lib/libcrypt-2.3.4.so
0012c000-00153000 rw-p 0012c000 00:00 0 
00153000-00278000 r-xp 00000000 fd:00 18187219   /lib/tls/libc-2.3.4.so
00278000-0027a000 r--p 00124000 fd:00 18187219   /lib/tls/libc-2.3.4.so
0027a000-0027c000 rw-p 00126000 fd:00 18187219   /lib/tls/libc-2.3.4.so
0027c000-0027e000 rw-p 0027c000 00:00 0 
0027e000-0028d000 r-xp 00000000 fd:00 18187224   /lib/libresolv-2.3.4.so
0028d000-0028e000 r--p 0000f000 fd:00 18187224   /lib/libresolv-2.3.4.so
0028e000-0028f000 rw-p 00010000 fd:00 18187224   /lib/libresolv-2.3.4.so
0028f000-00291000 rw-p 0028f000 00:00 0 
00291000-002c2000 r-xp 00000000 fd:00 18187228   /lib/libssl.so.0.9.7a
002c2000-002c5000 rw-p 00031000 fd:00 18187228   /lib/libssl.so.0.9.7a
002c5000-002c7000 r-xp 00000000 fd:00 18186360   /lib/libcom_err.so.2.1
002c7000-002c8000 rw-p 00001000 fd:00 18186360   /lib/libcom_err.so.2.1
... rest snipped

For files that are
actually "opened" (as in, via the open() system call), you can check the
proc file system, and here:

[root@lucy fd]# pwd
/proc/17155/fd
[root@lucy fd]# ll
total 13
lr-x------  1 root root 64 May 20 17:46 0 -> /dev/null
l-wx------  1 root root 64 May 20 17:46 1 -> /dev/null
l-wx------  1 root root 64 May 20 17:46 10 -> /etc/httpd/wwwtest.roswell.area51
l-wx------  1 root root 64 May 20 17:46 11 -> /etc/httpd/s-secure.roswell.area51
l-wx------  1 root root 64 May 20 17:46 12 -> /etc/httpd/secure.roswell.area51
l-wx------  1 root root 64 May 20 17:46 2 -> /var/log/httpd/error_log.1
lrwx------  1 root root 64 May 20 17:46 3 -> socket:[37481]
lrwx------  1 root root 64 May 20 17:46 4 -> socket:[37483]
lr-x------  1 root root 64 May 20 17:46 5 -> pipe:[422346]
l-wx------  1 root root 64 May 20 17:46 6 -> pipe:[422346]
l-wx------  1 root root 64 May 20 17:46 7 -> /var/log/httpd/error_log.1
l-wx------  1 root root 64 May 20 17:46 8 -> /var/log/httpd/access_log.1
l-wx------  1 root root 64 May 20 17:46 9 -> /etc/httpd/www.roswell.area51

These are the files that have actually been "open()"ed by the program (and I
see that logrotate on the development system is still borked, but I
digress).   Files 0 and 1 are STDIN and STDOUT respecitively and since this
is the webserver, they're remapped to '/dev/null'.  File 2 is STDERR and
that's redirected to '/var/log/httpd/error_log.1' (should be error_log, but
like I said, there's something wrong with logrotate on that system).  Files
3 and 4 are the listening sockets (port 80 and 443---if you cross reference
the lsof output, you can find out which sockets are listening to which
ports), and well, I can go on (the two pipes are probably there for CGIs,
and you hve the various config and log files) but I think you get the idea.  

In any case, this is what you should be concerned with.  And there are
several ways you can proceed:

	1. You can increase the open files limit with ulimit.

	2. You can consolidate all log files to two (access_log and error_log).

	3. You can consolidate all configuration files into one large file.

	4. Check Apache modules for open files and modify accordingly.

	5. Some combination of the above.

  But what you actually do depends upon what you are doing with your
webserver (whenever I've enountered this problem, I first consolidate error
logs, then increase ulimit).

  -spc (Hope this helps some ... )


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux