We have four Windows Servers running Apache 2.4.27 acting as load balancers for our application server cluster, which is running Tomcat. Recently, we have started to experience a high number of crashes with the web servers. Within the Apache
error logs we see the following: [Mon Jan 15 15:12:08.271099 2018] [mpm_winnt:notice] [pid 1696:tid 432] AH00428: Parent: child process 38240 exited with status 3221225477 -- Restarting. Between the four web servers, we often see over a dozen such crashes a day – sometimes more, sometimes less. In some cases Apache will crash after the child process was restarted only 5 minutes before. The number of crashes goes down significantly
during the night and weekends, but it still happens. As far as we can tell, we have not made any major changes to the configuration recently and have only started to experience this in the past few weeks. We were able to get a core dump from one of the web servers as it was crashing. The following is seem pieces extracted from it: FAULTING_IP: Looking at the Windows Event Viewer, we see modules “libaprutil-1” and “libapr-1” as the faulting modules when the crashes occur. One some rarer occasions, we will see “ntdll” and “libhttpd” as the faulting modules. We have tried increasing the thread stack size (based on similar reports online) but that has not helped. We’ve enabled forensic logging, trying to determine if there was some sort of rogue request that could be knocking us over, but nothing
seemed really out of place. Is there anything we can do to determine what the root cause is? Thanks -Tim |