Re: HELP: apache 2.2.17 creating zombies that are increasing server load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Martin,

Thanks for the detailed information.

The observed zombies are threads in Apache child processes. Those
processes (here PID 16042 are actually in the process of shutting down,
either due to a web server restart, or MPM configuration (like
MacRequestsPerChild or spare process configuration).

Unfortunately one of the threads falls into a non-terminated loop during
shutdown which consumes lots of cpu and prevents the process from
exiting. So the real problem is this looping thread:

> -----------------  lwp# 24 / thread# 24  --------------------
> ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624,
> a5a4f0) + 18
> ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20
> ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38
> ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c
> 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230
> ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c
> fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0)

Problems like that are unfortunately not easy to debug.

Do you use any 3rd-party modules, which did not come bundled with
Apache? Your config doesn't indicate it, but I'm asking to double check,
because e.g. "pfiles" lists OpenSSL libs without mod_ssl being loaded in
the config. It might be you compiled modules into httpd statically.

Any error message in the error_log?

Can you reproduce the problem? Even on a test system?

Although I'm not aware of any fixes directly related, it might be a good
first step to switch to 2.2.20 (or 2.2.21, which will be released likely
in few days) and apr 1.4.5 / apr-util 1.3.12 in order to start debugging
from recent versions.

Regards,

Rainer

On 07.09.2011 22:59, Martin, Jeff wrote:
> Hello,
> I have a Solaris 10 server running apache 2.2.17 and on a weekly basis
> its creating zombies and increasing the load to the point where we have
> to restart it every Thursday night. There are 6 apache instances running
> on this box but this is the only one seeing the issue. There have been
> no changes to the box that I am aware of or the developers are aware of.
> I've included a lot of output as I'm not sure what will be helpful and
> what won't. Any info or steps to resolve this is most appreciated. TIA.
> Jeff
> 
> bash-3.00# ulimit -a
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> open files                    (-n) 256
> pipe size          (512 bytes, -p) 10
> stack size            (kbytes, -s) 8192
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 29995
> virtual memory        (kbytes, -v) unlimited
> 
> bash-3.00# netstat -an|grep 172.23.181.34.80|wc -l
>     3438
> 
> bash-3.00# uptime
>   1:43pm  up 343 day(s),  2:59,  2 users,  load average: 4.41, 4.50,
> 4.39
> 
> SunOS 5.10 Generic_142909-17 sun4v sparc SUNW,SPARC-Enterprise-T5120
> 
> httpd.conf
> ServerRoot "/web/apache2-prod-showcase_second"
> 
> Listen 172.23.181.34:80
> 
> LoadModule headers_module modules/mod_headers.so
> LoadModule rewrite_module modules/mod_rewrite.so
> 
> <IfModule !mpm_netware_module>
> <IfModule !mpm_winnt_module>
> 
> User csdrd
> Group daemon
> 
> </IfModule>
> </IfModule>
> 
> ServerAdmin webmaster@xxxxxxxxxxxx
> 
> ServerName xx.xxxxx.com
> 
> DocumentRoot "/apps/doc-root"
> 
> ErrorLog "logs/error_log"
> LogLevel warn
> 
> DefaultType text/plain
> 
> # Cache control
> ExpiresActive   On
> ExpiresByType   image/gif       "access plus 1 weeks"
> ExpiresByType   image/jpg       "access plus 1 weeks"
> ExpiresByType   image/jpeg       "access plus 1 weeks"
> ExpiresByType   application/x-shockwave-flash       "access plus 1
> weeks"
> ExpiresByType   image/png       "access plus 1 weeks"
> FileETag none
> 
> ProxyRequests Off
> ProxyPreserveHost On
> 
> <Proxy *>
>         Order deny,allow
>         Deny from all
>         Allow from all
> </Proxy>
> 
> ProxyPass /showcase/explore balancer://exploreutc
> stickysession=JSESSIONID|jsessionid timeout=5 lbmethod=byrequests nofail
> over=Off
> # Port 8180 service bind
> <Proxy balancer://exploreutc>
>         BalancerMember http://172.22.81.99:8080/utc route=host3
>         BalancerMember http://172.22.81.100:8080/utc route=host4
>         BalancerMember http://172.22.81.99:8180/utc route=host3a
>         BalancerMember http://172.22.81.100:8180/utc route=host4a
> </Proxy>
> 
> <Directory />
>     Options FollowSymLinks
>     AllowOverride None
>     Order deny,allow
>     Deny from all
> </Directory>
> 
> <Directory "/apps/doc-root">
>     Options FollowSymLinks
>     AllowOverride All
>     Order allow,deny
>     Allow from all
> </Directory>
> 
> <Directory "/web/apache2-prod-showcase_second/cgi-bin">
>     AllowOverride None
>     Options None
>     Order allow,deny
>     Allow from all
> </Directory>
> 
> <FilesMatch "^\.ht">
>     Order allow,deny
>     Deny from all
>     Satisfy All
> </FilesMatch>
> 
> <IfModule dir_module>
>     DirectoryIndex index_explore.html
> </IfModule>
> 
> <IfModule log_config_module>
>     LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\"" combined
>     LogFormat "%h %l %u %t \"%r\" %>s %b" common
>      
> <IfModule logio_module>
>       # You need to enable mod_logio.c to use %I and %O
>       LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\" %I %O" combinedio
>     </IfModule>
> </IfModule>
> 
> <IfModule alias_module>
>     ScriptAlias /cgi-bin/ "/web/apache2-prod-showcase_second/cgi-bin/"
> </IfModule>
> 
> <IfModule cgid_module>
> </IfModule>
> 
> <IfModule mime_module>
>     TypesConfig conf/mime.types
>     AddType application/x-compress .Z
>     AddType application/x-gzip .gz .tgz
> </IfModule>
> 
> <IfModule ssl_module>
> SSLRandomSeed startup builtin
> SSLRandomSeed connect builtin
> </IfModule>
> 
> bash-3.00# ./httpd -S
> VirtualHost configuration:
> Syntax OK
> 
> bash-3.00# ./httpd -V
> Server version: Apache/2.2.17 (Unix)
> Server built:   Mar 16 2011 16:19:54
> Server's Module Magic Number: 20051115:25
> Server loaded:  APR 1.4.2, APR-Util 1.3.10
> Compiled using: APR 1.4.2, APR-Util 1.3.10
> Architecture:   32-bit
> Server MPM:     Worker
>   threaded:     yes (fixed thread count)
>     forked:     yes (variable process count)
> Server compiled with....
> -D APACHE_MPM_DIR="server/mpm/worker"
> -D APR_HAS_SENDFILE
> -D APR_HAS_MMAP
> -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
> -D APR_USE_PROC_PTHREAD_SERIALIZE
> -D APR_USE_PTHREAD_SERIALIZE
> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
> -D APR_HAS_OTHER_CHILD
> -D AP_HAVE_RELIABLE_PIPED_LOGS
> -D DYNAMIC_MODULE_LIMIT=128
> -D HTTPD_ROOT="/web/apache2-prod-showcase_second"
> -D SUEXEC_BIN="/web/apache2-prod-showcase_second/bin/suexec"
> -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
> -D DEFAULT_ERRORLOG="logs/error_log"
> -D AP_TYPES_CONFIG_FILE="conf/mime.types"
> -D SERVER_CONFIG_FILE="conf/httpd.conf"
> 
>  bash-3.00# pstack 7619
> 7619:   /web/apache2-prod-showcase_second/bin/httpd -k start
> fecccdbc pollsys  (ffbff868, 0, ffbff8d0, 0)
> fec68590 pselect  (ffbff868, fed34728, fed34728, 0, ffbff8d0, 0) + 1c8
> fec68908 select   (0, 0, 0, 0, ffbff938, 0) + a0
> ff0219b8 apr_sleep (0, f4240, ffbffa4c, 0, eb610, 11176) + 4c
> 0004aadc ap_wait_or_timeout (ffbffa4c, ffbffa48, ffbffad0, eb610, dd000,
> e0400) + 60
> 0009a764 ap_mpm_run (fead01d8, 1c, 0, 20, 6, 3e) + 218
> 0002fcc8 main     (eb610, db400, ddc00, ddc00, e9608, 0) + 76c
> 0002f08c _start   (0, 0, 0, 0, 0, 0) + 5c
> 
> 16042 csdrd      20M   16M cpu20   50    0   3:52:05 3.1% httpd/24
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/65
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/64
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/63
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/62
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/61
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/60
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/59
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/58
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/57
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/56
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/55
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/54
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/53
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/52
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/51
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/50
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/49
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/48
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/47
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/46
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/45
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/44
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/43
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/42
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/41
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/40
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/39
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/38
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/37
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/36
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/35
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/34
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/33
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/32
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/31
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/30
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/29
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/28
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/27
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/26
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/25
> 16042 csdrd      20M   16M sleep   59    0   0:00:00 0.0% httpd/1
> 
> bash-3.00# pstack 16042
> 16042:  /web/apache2-prod-showcase_second/bin/httpd -k start
> -----------------  lwp# 1 / thread# 1  --------------------
> feccd210 lwp_wait (18, ffbff7d4)
> fecc60f0 _thrp_join (18, 0, ffbff83c, 1, ffbff7d4, fed35900) + 34
> ff020778 apr_thread_join (ffbff8bc, 19aef8, 2, 0, 1, c6bf0) + c
> 00099f28 join_workers (54, 1b3ee8, 99ab8, 19a810, 0, 1) + ec
> 0009a27c child_main (7, 98e0c, 0, 0, fed35960, ff172a00) + 270
> 0009a45c make_child (ddc00, 7, 1, e0c00, dd000, e0400) + 128
> 0009ac8c ap_mpm_run (fead0198, 18, 0, 20, 1, 15) + 740
> 0002fcc8 main     (eb610, db400, ddc00, ddc00, e9608, 0) + 76c
> 0002f08c _start   (0, 0, 0, 0, 0, 0) + 5c
> -----------------  lwp# 24 / thread# 24  --------------------
> ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624,
> a5a4f0) + 18
> ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20
> ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38
> ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c
> 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230
> ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c
> fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0)
> -----------------  lwp# 25 / thread# 25  --------------------
> ff020634 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 26 / thread# 26  --------------------
> ff020634 dummy_worker(), exit value = 0x00000000
>        ** zombie (exited, not detached, not yet joined) **
> <SNIP more of the same.....>
> 
> bash-3.00# pfiles 16042
> 16042:  /web/apache2-prod-showcase_second/bin/httpd -k start
>   Current rlimit: 65536 file descriptors
>    0: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2
>       O_RDONLY
>       /devices/pseudo/mm@0:null
>    1: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2
>       O_WRONLY|O_CREAT|O_TRUNC
>       /devices/pseudo/mm@0:null
>    2: S_IFREG mode:0644 dev:32,26 ino:110758 uid:0 gid:0 size:570041
>       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
>       /web/apache2-prod-showcase_second/logs/error_log
>    4: S_IFDOOR mode:0444 dev:357,0 ino:42 uid:0 gid:0 size:0
>       O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to pid -1
>    5: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0
>       O_RDWR FD_CLOEXEC
>    6: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0
>       O_RDWR FD_CLOEXEC
>    7: S_IFREG mode:0644 dev:32,26 ino:110763 uid:0 gid:0 size:1240942649
>       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
>       /web/apache2-prod-showcase_second/logs/access_log
>   18: S_IFSOCK mode:0666 dev:355,0 ino:10041 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50949
>         peername: AF_INET 172.22.81.100  port: 8180
>   24: S_IFSOCK mode:0666 dev:355,0 ino:40577 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 51076
>         peername: AF_INET 172.22.81.99  port: 8180
>   48: S_IFSOCK mode:0666 dev:355,0 ino:41083 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50927
>         peername: AF_INET 172.22.81.100  port: 8180
>   49: S_IFSOCK mode:0666 dev:355,0 ino:27268 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 51025
>         peername: AF_INET 172.22.81.99  port: 8180
>   51: S_IFSOCK mode:0666 dev:355,0 ino:10997 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50900
>         peername: AF_INET 172.22.81.99  port: 8180
>            
> bash-3.00# ldd httpd
>         libssl.so.1.0.0 =>       /usr/local/ssl/lib/libssl.so.1.0.0
>         libcrypto.so.1.0.0 =>    /usr/local/ssl/lib/libcrypto.so.1.0.0
>         libdl.so.1 =>    /lib/libdl.so.1
>         libm.so.2 =>     /lib/libm.so.2
>         libaprutil-1.so.0 =>
> /web/apache2-prod-showcase_second/lib/libaprutil-1.so.0
>         libexpat.so.0 =>
> /web/apache2-prod-showcase_second/lib/libexpat.so.0
>         libiconv.so.2 =>         /usr/local/lib/libiconv.so.2
>         libapr-1.so.0 =>
> /web/apache2-prod-showcase_second/lib/libapr-1.so.0
>         libuuid.so.1 =>  /lib/libuuid.so.1
>         libsendfile.so.1 =>      /lib/libsendfile.so.1
>         librt.so.1 =>    /lib/librt.so.1
>         libsocket.so.1 =>        /lib/libsocket.so.1
>         libnsl.so.1 =>   /lib/libnsl.so.1
>         libpthread.so.1 =>       /lib/libpthread.so.1
>         libc.so.1 =>     /lib/libc.so.1
>         libgcc_s.so.1 =>         /usr/local/lib/libgcc_s.so.1
>         libaio.so.1 =>   /lib/libaio.so.1
>         libmd.so.1 =>    /lib/libmd.so.1
>         libmp.so.2 =>    /lib/libmp.so.2
>         libscf.so.1 =>   /lib/libscf.so.1
>         libdoor.so.1 =>  /lib/libdoor.so.1
>         libuutil.so.1 =>         /lib/libuutil.so.1
>         libgen.so.1 =>   /lib/libgen.so.1
>         /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1
>         /platform/SUNW,SPARC-Enterprise-T5120/lib/libmd_psr.so.1

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux