Thank you for your help with this.. We are pursuing an upgrade. -----Original Message----- From: Rainer Jung [mailto:rainer.jung@xxxxxxxxxxx] Sent: Saturday, September 10, 2011 3:45 AM To: users@xxxxxxxxxxxxxxxx Cc: Martin, Jeff Subject: Re: HELP: apache 2.2.17 creating zombies that are increasing server load Hi Martin, Thanks for the detailed information. The observed zombies are threads in Apache child processes. Those processes (here PID 16042 are actually in the process of shutting down, either due to a web server restart, or MPM configuration (like MacRequestsPerChild or spare process configuration). Unfortunately one of the threads falls into a non-terminated loop during shutdown which consumes lots of cpu and prevents the process from exiting. So the real problem is this looping thread: > ----------------- lwp# 24 / thread# 24 -------------------- > ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624, > a5a4f0) + 18 > ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20 > ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38 > ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c > 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230 > ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c > fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0) Problems like that are unfortunately not easy to debug. Do you use any 3rd-party modules, which did not come bundled with Apache? Your config doesn't indicate it, but I'm asking to double check, because e.g. "pfiles" lists OpenSSL libs without mod_ssl being loaded in the config. It might be you compiled modules into httpd statically. Any error message in the error_log? Can you reproduce the problem? Even on a test system? Although I'm not aware of any fixes directly related, it might be a good first step to switch to 2.2.20 (or 2.2.21, which will be released likely in few days) and apr 1.4.5 / apr-util 1.3.12 in order to start debugging from recent versions. Regards, Rainer On 07.09.2011 22:59, Martin, Jeff wrote: > Hello, > I have a Solaris 10 server running apache 2.2.17 and on a weekly basis > its creating zombies and increasing the load to the point where we have > to restart it every Thursday night. There are 6 apache instances running > on this box but this is the only one seeing the issue. There have been > no changes to the box that I am aware of or the developers are aware of. > I've included a lot of output as I'm not sure what will be helpful and > what won't. Any info or steps to resolve this is most appreciated. TIA. > Jeff > > bash-3.00# ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > open files (-n) 256 > pipe size (512 bytes, -p) 10 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 29995 > virtual memory (kbytes, -v) unlimited > > bash-3.00# netstat -an|grep 172.23.181.34.80|wc -l > 3438 > > bash-3.00# uptime > 1:43pm up 343 day(s), 2:59, 2 users, load average: 4.41, 4.50, > 4.39 > > SunOS 5.10 Generic_142909-17 sun4v sparc SUNW,SPARC-Enterprise-T5120 > > httpd.conf > ServerRoot "/web/apache2-prod-showcase_second" > > Listen 172.23.181.34:80 > > LoadModule headers_module modules/mod_headers.so > LoadModule rewrite_module modules/mod_rewrite.so > > <IfModule !mpm_netware_module> > <IfModule !mpm_winnt_module> > > User csdrd > Group daemon > > </IfModule> > </IfModule> > > ServerAdmin webmaster@xxxxxxxxxxxx > > ServerName xx.xxxxx.com > > DocumentRoot "/apps/doc-root" > > ErrorLog "logs/error_log" > LogLevel warn > > DefaultType text/plain > > # Cache control > ExpiresActive On > ExpiresByType image/gif "access plus 1 weeks" > ExpiresByType image/jpg "access plus 1 weeks" > ExpiresByType image/jpeg "access plus 1 weeks" > ExpiresByType application/x-shockwave-flash "access plus 1 > weeks" > ExpiresByType image/png "access plus 1 weeks" > FileETag none > > ProxyRequests Off > ProxyPreserveHost On > > <Proxy *> > Order deny,allow > Deny from all > Allow from all > </Proxy> > > ProxyPass /showcase/explore balancer://exploreutc > stickysession=JSESSIONID|jsessionid timeout=5 lbmethod=byrequests nofail > over=Off > # Port 8180 service bind > <Proxy balancer://exploreutc> > BalancerMember http://172.22.81.99:8080/utc route=host3 > BalancerMember http://172.22.81.100:8080/utc route=host4 > BalancerMember http://172.22.81.99:8180/utc route=host3a > BalancerMember http://172.22.81.100:8180/utc route=host4a > </Proxy> > > <Directory /> > Options FollowSymLinks > AllowOverride None > Order deny,allow > Deny from all > </Directory> > > <Directory "/apps/doc-root"> > Options FollowSymLinks > AllowOverride All > Order allow,deny > Allow from all > </Directory> > > <Directory "/web/apache2-prod-showcase_second/cgi-bin"> > AllowOverride None > Options None > Order allow,deny > Allow from all > </Directory> > > <FilesMatch "^\.ht"> > Order allow,deny > Deny from all > Satisfy All > </FilesMatch> > > <IfModule dir_module> > DirectoryIndex index_explore.html > </IfModule> > > <IfModule log_config_module> > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" > \"%{User-Agent}i\"" combined > LogFormat "%h %l %u %t \"%r\" %>s %b" common > > <IfModule logio_module> > # You need to enable mod_logio.c to use %I and %O > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" > \"%{User-Agent}i\" %I %O" combinedio > </IfModule> > </IfModule> > > <IfModule alias_module> > ScriptAlias /cgi-bin/ "/web/apache2-prod-showcase_second/cgi-bin/" > </IfModule> > > <IfModule cgid_module> > </IfModule> > > <IfModule mime_module> > TypesConfig conf/mime.types > AddType application/x-compress .Z > AddType application/x-gzip .gz .tgz > </IfModule> > > <IfModule ssl_module> > SSLRandomSeed startup builtin > SSLRandomSeed connect builtin > </IfModule> > > bash-3.00# ./httpd -S > VirtualHost configuration: > Syntax OK > > bash-3.00# ./httpd -V > Server version: Apache/2.2.17 (Unix) > Server built: Mar 16 2011 16:19:54 > Server's Module Magic Number: 20051115:25 > Server loaded: APR 1.4.2, APR-Util 1.3.10 > Compiled using: APR 1.4.2, APR-Util 1.3.10 > Architecture: 32-bit > Server MPM: Worker > threaded: yes (fixed thread count) > forked: yes (variable process count) > Server compiled with.... > -D APACHE_MPM_DIR="server/mpm/worker" > -D APR_HAS_SENDFILE > -D APR_HAS_MMAP > -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) > -D APR_USE_PROC_PTHREAD_SERIALIZE > -D APR_USE_PTHREAD_SERIALIZE > -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT > -D APR_HAS_OTHER_CHILD > -D AP_HAVE_RELIABLE_PIPED_LOGS > -D DYNAMIC_MODULE_LIMIT=128 > -D HTTPD_ROOT="/web/apache2-prod-showcase_second" > -D SUEXEC_BIN="/web/apache2-prod-showcase_second/bin/suexec" > -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" > -D DEFAULT_ERRORLOG="logs/error_log" > -D AP_TYPES_CONFIG_FILE="conf/mime.types" > -D SERVER_CONFIG_FILE="conf/httpd.conf" > > bash-3.00# pstack 7619 > 7619: /web/apache2-prod-showcase_second/bin/httpd -k start > fecccdbc pollsys (ffbff868, 0, ffbff8d0, 0) > fec68590 pselect (ffbff868, fed34728, fed34728, 0, ffbff8d0, 0) + 1c8 > fec68908 select (0, 0, 0, 0, ffbff938, 0) + a0 > ff0219b8 apr_sleep (0, f4240, ffbffa4c, 0, eb610, 11176) + 4c > 0004aadc ap_wait_or_timeout (ffbffa4c, ffbffa48, ffbffad0, eb610, dd000, > e0400) + 60 > 0009a764 ap_mpm_run (fead01d8, 1c, 0, 20, 6, 3e) + 218 > 0002fcc8 main (eb610, db400, ddc00, ddc00, e9608, 0) + 76c > 0002f08c _start (0, 0, 0, 0, 0, 0) + 5c > > 16042 csdrd 20M 16M cpu20 50 0 3:52:05 3.1% httpd/24 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/65 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/64 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/63 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/62 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/61 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/60 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/59 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/58 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/57 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/56 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/55 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/54 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/53 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/52 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/51 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/50 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/49 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/48 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/47 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/46 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/45 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/44 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/43 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/42 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/41 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/40 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/39 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/38 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/37 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/36 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/35 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/34 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/33 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/32 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/31 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/30 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/29 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/28 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/27 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/26 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/25 > 16042 csdrd 20M 16M sleep 59 0 0:00:00 0.0% httpd/1 > > bash-3.00# pstack 16042 > 16042: /web/apache2-prod-showcase_second/bin/httpd -k start > ----------------- lwp# 1 / thread# 1 -------------------- > feccd210 lwp_wait (18, ffbff7d4) > fecc60f0 _thrp_join (18, 0, ffbff83c, 1, ffbff7d4, fed35900) + 34 > ff020778 apr_thread_join (ffbff8bc, 19aef8, 2, 0, 1, c6bf0) + c > 00099f28 join_workers (54, 1b3ee8, 99ab8, 19a810, 0, 1) + ec > 0009a27c child_main (7, 98e0c, 0, 0, fed35960, ff172a00) + 270 > 0009a45c make_child (ddc00, 7, 1, e0c00, dd000, e0400) + 128 > 0009ac8c ap_mpm_run (fead0198, 18, 0, 20, 1, 15) + 740 > 0002fcc8 main (eb610, db400, ddc00, ddc00, e9608, 0) + 76c > 0002f08c _start (0, 0, 0, 0, 0, 0) + 5c > ----------------- lwp# 24 / thread# 24 -------------------- > ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624, > a5a4f0) + 18 > ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20 > ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38 > ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c > 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230 > ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c > fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0) > ----------------- lwp# 25 / thread# 25 -------------------- > ff020634 dummy_worker(), exit value = 0x00000000 > ** zombie (exited, not detached, not yet joined) ** > ----------------- lwp# 26 / thread# 26 -------------------- > ff020634 dummy_worker(), exit value = 0x00000000 > ** zombie (exited, not detached, not yet joined) ** > <SNIP more of the same.....> > > bash-3.00# pfiles 16042 > 16042: /web/apache2-prod-showcase_second/bin/httpd -k start > Current rlimit: 65536 file descriptors > 0: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2 > O_RDONLY > /devices/pseudo/mm@0:null > 1: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2 > O_WRONLY|O_CREAT|O_TRUNC > /devices/pseudo/mm@0:null > 2: S_IFREG mode:0644 dev:32,26 ino:110758 uid:0 gid:0 size:570041 > O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE > /web/apache2-prod-showcase_second/logs/error_log > 4: S_IFDOOR mode:0444 dev:357,0 ino:42 uid:0 gid:0 size:0 > O_RDONLY|O_LARGEFILE FD_CLOEXEC door to pid -1 > 5: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0 > O_RDWR FD_CLOEXEC > 6: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0 > O_RDWR FD_CLOEXEC > 7: S_IFREG mode:0644 dev:32,26 ino:110763 uid:0 gid:0 size:1240942649 > O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC > /web/apache2-prod-showcase_second/logs/access_log > 18: S_IFSOCK mode:0666 dev:355,0 ino:10041 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50949 > peername: AF_INET 172.22.81.100 port: 8180 > 24: S_IFSOCK mode:0666 dev:355,0 ino:40577 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 51076 > peername: AF_INET 172.22.81.99 port: 8180 > 48: S_IFSOCK mode:0666 dev:355,0 ino:41083 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50927 > peername: AF_INET 172.22.81.100 port: 8180 > 49: S_IFSOCK mode:0666 dev:355,0 ino:27268 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 51025 > peername: AF_INET 172.22.81.99 port: 8180 > 51: S_IFSOCK mode:0666 dev:355,0 ino:10997 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50900 > peername: AF_INET 172.22.81.99 port: 8180 > > bash-3.00# ldd httpd > libssl.so.1.0.0 => /usr/local/ssl/lib/libssl.so.1.0.0 > libcrypto.so.1.0.0 => /usr/local/ssl/lib/libcrypto.so.1.0.0 > libdl.so.1 => /lib/libdl.so.1 > libm.so.2 => /lib/libm.so.2 > libaprutil-1.so.0 => > /web/apache2-prod-showcase_second/lib/libaprutil-1.so.0 > libexpat.so.0 => > /web/apache2-prod-showcase_second/lib/libexpat.so.0 > libiconv.so.2 => /usr/local/lib/libiconv.so.2 > libapr-1.so.0 => > /web/apache2-prod-showcase_second/lib/libapr-1.so.0 > libuuid.so.1 => /lib/libuuid.so.1 > libsendfile.so.1 => /lib/libsendfile.so.1 > librt.so.1 => /lib/librt.so.1 > libsocket.so.1 => /lib/libsocket.so.1 > libnsl.so.1 => /lib/libnsl.so.1 > libpthread.so.1 => /lib/libpthread.so.1 > libc.so.1 => /lib/libc.so.1 > libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 > libaio.so.1 => /lib/libaio.so.1 > libmd.so.1 => /lib/libmd.so.1 > libmp.so.2 => /lib/libmp.so.2 > libscf.so.1 => /lib/libscf.so.1 > libdoor.so.1 => /lib/libdoor.so.1 > libuutil.so.1 => /lib/libuutil.so.1 > libgen.so.1 => /lib/libgen.so.1 > /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1 > /platform/SUNW,SPARC-Enterprise-T5120/lib/libmd_psr.so.1 ________________________________ This message may contain confidential information. If you are not the intended recipient of this e-mail, do not disseminate, distribute or copy this e-mail and delete this e-mail from your system. --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx