Hi Martin, Thanks for the detailed information. The observed zombies are threads in Apache child processes. Those processes (here PID 16042 are actually in the process of shutting down, either due to a web server restart, or MPM configuration (like MacRequestsPerChild or spare process configuration). Unfortunately one of the threads falls into a non-terminated loop during shutdown which consumes lots of cpu and prevents the process from exiting. So the real problem is this looping thread: > ----------------- lwp# 24 / thread# 24 -------------------- > ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624, > a5a4f0) + 18 > ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20 > ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38 > ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c > 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230 > ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c > fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0) Problems like that are unfortunately not easy to debug. Do you use any 3rd-party modules, which did not come bundled with Apache? Your config doesn't indicate it, but I'm asking to double check, because e.g. "pfiles" lists OpenSSL libs without mod_ssl being loaded in the config. It might be you compiled modules into httpd statically. Any error message in the error_log? Can you reproduce the problem? Even on a test system? Although I'm not aware of any fixes directly related, it might be a good first step to switch to 2.2.20 (or 2.2.21, which will be released likely in few days) and apr 1.4.5 / apr-util 1.3.12 in order to start debugging from recent versions. Regards, Rainer On 07.09.2011 22:59, Martin, Jeff wrote: > Hello, > I have a Solaris 10 server running apache 2.2.17 and on a weekly basis > its creating zombies and increasing the load to the point where we have > to restart it every Thursday night. There are 6 apache instances running > on this box but this is the only one seeing the issue. There have been > no changes to the box that I am aware of or the developers are aware of. > I've included a lot of output as I'm not sure what will be helpful and > what won't. Any info or steps to resolve this is most appreciated. TIA. > Jeff > > bash-3.00# ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > open files (-n) 256 > pipe size (512 bytes, -p) 10 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 29995 > virtual memory (kbytes, -v) unlimited > > bash-3.00# netstat -an|grep 172.23.181.34.80|wc -l > 3438 > > bash-3.00# uptime > 1:43pm up 343 day(s), 2:59, 2 users, load average: 4.41, 4.50, > 4.39 > > SunOS 5.10 Generic_142909-17 sun4v sparc SUNW,SPARC-Enterprise-T5120 > > httpd.conf > ServerRoot "/web/apache2-prod-showcase_second" > > Listen 172.23.181.34:80 > > LoadModule headers_module modules/mod_headers.so > LoadModule rewrite_module modules/mod_rewrite.so > > <IfModule !mpm_netware_module> > <IfModule !mpm_winnt_module> > > User csdrd > Group daemon > > </IfModule> > </IfModule> > > ServerAdmin webmaster@xxxxxxxxxxxx > > ServerName xx.xxxxx.com > > DocumentRoot "/apps/doc-root" > > ErrorLog "logs/error_log" > LogLevel warn > > DefaultType text/plain > > # Cache control > ExpiresActive On > ExpiresByType image/gif "access plus 1 weeks" > ExpiresByType image/jpg "access plus 1 weeks" > ExpiresByType image/jpeg "access plus 1 weeks" > ExpiresByType application/x-shockwave-flash "access plus 1 > weeks" > ExpiresByType image/png "access plus 1 weeks" > FileETag none > > ProxyRequests Off > ProxyPreserveHost On > > <Proxy *> > Order deny,allow > Deny from all > Allow from all > </Proxy> > > ProxyPass /showcase/explore balancer://exploreutc > stickysession=JSESSIONID|jsessionid timeout=5 lbmethod=byrequests nofail > over=Off > # Port 8180 service bind > <Proxy balancer://exploreutc> > BalancerMember http://172.22.81.99:8080/utc route=host3 > BalancerMember http://172.22.81.100:8080/utc route=host4 > BalancerMember http://172.22.81.99:8180/utc route=host3a > BalancerMember http://172.22.81.100:8180/utc route=host4a > </Proxy> > > <Directory /> > Options FollowSymLinks > AllowOverride None > Order deny,allow > Deny from all > </Directory> > > <Directory "/apps/doc-root"> > Options FollowSymLinks > AllowOverride All > Order allow,deny > Allow from all > </Directory> > > <Directory "/web/apache2-prod-showcase_second/cgi-bin"> > AllowOverride None > Options None > Order allow,deny > Allow from all > </Directory> > > <FilesMatch "^\.ht"> > Order allow,deny > Deny from all > Satisfy All > </FilesMatch> > > <IfModule dir_module> > DirectoryIndex index_explore.html > </IfModule> > > <IfModule log_config_module> > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" > \"%{User-Agent}i\"" combined > LogFormat "%h %l %u %t \"%r\" %>s %b" common > > <IfModule logio_module> > # You need to enable mod_logio.c to use %I and %O > LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" > \"%{User-Agent}i\" %I %O" combinedio > </IfModule> > </IfModule> > > <IfModule alias_module> > ScriptAlias /cgi-bin/ "/web/apache2-prod-showcase_second/cgi-bin/" > </IfModule> > > <IfModule cgid_module> > </IfModule> > > <IfModule mime_module> > TypesConfig conf/mime.types > AddType application/x-compress .Z > AddType application/x-gzip .gz .tgz > </IfModule> > > <IfModule ssl_module> > SSLRandomSeed startup builtin > SSLRandomSeed connect builtin > </IfModule> > > bash-3.00# ./httpd -S > VirtualHost configuration: > Syntax OK > > bash-3.00# ./httpd -V > Server version: Apache/2.2.17 (Unix) > Server built: Mar 16 2011 16:19:54 > Server's Module Magic Number: 20051115:25 > Server loaded: APR 1.4.2, APR-Util 1.3.10 > Compiled using: APR 1.4.2, APR-Util 1.3.10 > Architecture: 32-bit > Server MPM: Worker > threaded: yes (fixed thread count) > forked: yes (variable process count) > Server compiled with.... > -D APACHE_MPM_DIR="server/mpm/worker" > -D APR_HAS_SENDFILE > -D APR_HAS_MMAP > -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) > -D APR_USE_PROC_PTHREAD_SERIALIZE > -D APR_USE_PTHREAD_SERIALIZE > -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT > -D APR_HAS_OTHER_CHILD > -D AP_HAVE_RELIABLE_PIPED_LOGS > -D DYNAMIC_MODULE_LIMIT=128 > -D HTTPD_ROOT="/web/apache2-prod-showcase_second" > -D SUEXEC_BIN="/web/apache2-prod-showcase_second/bin/suexec" > -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" > -D DEFAULT_ERRORLOG="logs/error_log" > -D AP_TYPES_CONFIG_FILE="conf/mime.types" > -D SERVER_CONFIG_FILE="conf/httpd.conf" > > bash-3.00# pstack 7619 > 7619: /web/apache2-prod-showcase_second/bin/httpd -k start > fecccdbc pollsys (ffbff868, 0, ffbff8d0, 0) > fec68590 pselect (ffbff868, fed34728, fed34728, 0, ffbff8d0, 0) + 1c8 > fec68908 select (0, 0, 0, 0, ffbff938, 0) + a0 > ff0219b8 apr_sleep (0, f4240, ffbffa4c, 0, eb610, 11176) + 4c > 0004aadc ap_wait_or_timeout (ffbffa4c, ffbffa48, ffbffad0, eb610, dd000, > e0400) + 60 > 0009a764 ap_mpm_run (fead01d8, 1c, 0, 20, 6, 3e) + 218 > 0002fcc8 main (eb610, db400, ddc00, ddc00, e9608, 0) + 76c > 0002f08c _start (0, 0, 0, 0, 0, 0) + 5c > > 16042 csdrd 20M 16M cpu20 50 0 3:52:05 3.1% httpd/24 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/65 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/64 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/63 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/62 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/61 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/60 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/59 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/58 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/57 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/56 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/55 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/54 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/53 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/52 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/51 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/50 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/49 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/48 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/47 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/46 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/45 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/44 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/43 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/42 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/41 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/40 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/39 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/38 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/37 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/36 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/35 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/34 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/33 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/32 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/31 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/30 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/29 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/28 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/27 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/26 > 16042 csdrd 20M 16M zombie 0 - 0:00:00 0.0% httpd/25 > 16042 csdrd 20M 16M sleep 59 0 0:00:00 0.0% httpd/1 > > bash-3.00# pstack 16042 > 16042: /web/apache2-prod-showcase_second/bin/httpd -k start > ----------------- lwp# 1 / thread# 1 -------------------- > feccd210 lwp_wait (18, ffbff7d4) > fecc60f0 _thrp_join (18, 0, ffbff83c, 1, ffbff7d4, fed35900) + 34 > ff020778 apr_thread_join (ffbff8bc, 19aef8, 2, 0, 1, c6bf0) + c > 00099f28 join_workers (54, 1b3ee8, 99ab8, 19a810, 0, 1) + ec > 0009a27c child_main (7, 98e0c, 0, 0, fed35960, ff172a00) + 270 > 0009a45c make_child (ddc00, 7, 1, e0c00, dd000, e0400) + 128 > 0009ac8c ap_mpm_run (fead0198, 18, 0, 20, 1, 15) + 740 > 0002fcc8 main (eb610, db400, ddc00, ddc00, e9608, 0) + 76c > 0002f08c _start (0, 0, 0, 0, 0, 0) + 5c > ----------------- lwp# 24 / thread# 24 -------------------- > ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624, > a5a4f0) + 18 > ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20 > ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38 > ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c > 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230 > ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c > fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0) > ----------------- lwp# 25 / thread# 25 -------------------- > ff020634 dummy_worker(), exit value = 0x00000000 > ** zombie (exited, not detached, not yet joined) ** > ----------------- lwp# 26 / thread# 26 -------------------- > ff020634 dummy_worker(), exit value = 0x00000000 > ** zombie (exited, not detached, not yet joined) ** > <SNIP more of the same.....> > > bash-3.00# pfiles 16042 > 16042: /web/apache2-prod-showcase_second/bin/httpd -k start > Current rlimit: 65536 file descriptors > 0: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2 > O_RDONLY > /devices/pseudo/mm@0:null > 1: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2 > O_WRONLY|O_CREAT|O_TRUNC > /devices/pseudo/mm@0:null > 2: S_IFREG mode:0644 dev:32,26 ino:110758 uid:0 gid:0 size:570041 > O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE > /web/apache2-prod-showcase_second/logs/error_log > 4: S_IFDOOR mode:0444 dev:357,0 ino:42 uid:0 gid:0 size:0 > O_RDONLY|O_LARGEFILE FD_CLOEXEC door to pid -1 > 5: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0 > O_RDWR FD_CLOEXEC > 6: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0 > O_RDWR FD_CLOEXEC > 7: S_IFREG mode:0644 dev:32,26 ino:110763 uid:0 gid:0 size:1240942649 > O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC > /web/apache2-prod-showcase_second/logs/access_log > 18: S_IFSOCK mode:0666 dev:355,0 ino:10041 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50949 > peername: AF_INET 172.22.81.100 port: 8180 > 24: S_IFSOCK mode:0666 dev:355,0 ino:40577 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 51076 > peername: AF_INET 172.22.81.99 port: 8180 > 48: S_IFSOCK mode:0666 dev:355,0 ino:41083 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50927 > peername: AF_INET 172.22.81.100 port: 8180 > 49: S_IFSOCK mode:0666 dev:355,0 ino:27268 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 51025 > peername: AF_INET 172.22.81.99 port: 8180 > 51: S_IFSOCK mode:0666 dev:355,0 ino:10997 uid:0 gid:0 size:0 > O_RDWR|O_NONBLOCK FD_CLOEXEC > SOCK_STREAM > SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) > sockname: AF_INET 172.22.81.122 port: 50900 > peername: AF_INET 172.22.81.99 port: 8180 > > bash-3.00# ldd httpd > libssl.so.1.0.0 => /usr/local/ssl/lib/libssl.so.1.0.0 > libcrypto.so.1.0.0 => /usr/local/ssl/lib/libcrypto.so.1.0.0 > libdl.so.1 => /lib/libdl.so.1 > libm.so.2 => /lib/libm.so.2 > libaprutil-1.so.0 => > /web/apache2-prod-showcase_second/lib/libaprutil-1.so.0 > libexpat.so.0 => > /web/apache2-prod-showcase_second/lib/libexpat.so.0 > libiconv.so.2 => /usr/local/lib/libiconv.so.2 > libapr-1.so.0 => > /web/apache2-prod-showcase_second/lib/libapr-1.so.0 > libuuid.so.1 => /lib/libuuid.so.1 > libsendfile.so.1 => /lib/libsendfile.so.1 > librt.so.1 => /lib/librt.so.1 > libsocket.so.1 => /lib/libsocket.so.1 > libnsl.so.1 => /lib/libnsl.so.1 > libpthread.so.1 => /lib/libpthread.so.1 > libc.so.1 => /lib/libc.so.1 > libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 > libaio.so.1 => /lib/libaio.so.1 > libmd.so.1 => /lib/libmd.so.1 > libmp.so.2 => /lib/libmp.so.2 > libscf.so.1 => /lib/libscf.so.1 > libdoor.so.1 => /lib/libdoor.so.1 > libuutil.so.1 => /lib/libuutil.so.1 > libgen.so.1 => /lib/libgen.so.1 > /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1 > /platform/SUNW,SPARC-Enterprise-T5120/lib/libmd_psr.so.1 --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx