Re: Help: Apache Crashing Everyday (Apache Users)

Hi Luca,

Thanks for the details.

1. our server's ulimit values are:

]$ ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 63714

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 1024

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

Please let me know whether the values are sufficient to allow at least 500 concurrent connections.

2. Yes I checked mod_jk log when hang happens, and getting below errors continuously.

[Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284

[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted or client network problems

[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1)

[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1384): service failed, worker qu_prod_live_svr1 is in local error state

[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1403): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance.

[Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901

[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] jk_handler::mod_jk.c (2608): Aborting connection for worker=loadbalancer

[Wed Apr 19 02:00:39 2017][16261:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with /

[Wed Apr 19 02:00:40 2017][16308:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with /

3. We will upgrade to 2.4.25, could you please share optimal configuration for mpm-event to allow more concurrent users, please.

Thanks

Jay

On Tue, Apr 18, 2017 at 10:03 AM, Luca Toscano <toscano.luca@xxxxxxxxx> wrote:

Hi,

Some suggestions:

1) check your RHEL ulimits applied to httpd, the error message "Resource temporarily unavailable: setuid: unable to change to uid" could be related to maximum number of processes (allowed by the OS) reached. This should allow you to spawn more httpd processes.

2) Have you checked when the "hang" happens? If you have long lived connections and your httpd server reloads (for example for log rotation) then it might hang a bit while waiting for the remaining connections to drain.

3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use mpm-event (rather than prefork).

Hope that helps!

Luca

2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy <jayaram.ponnusamy@xxxxxxxxx>:
Dear All,

We were runnig our site in PHP based CMS tool earlier, and normally 20-30K users will access our sites daily. But in new system with Tomcat, we are facing performance and availability issue frequently, when i access the tomcat url directly the page is loading within 3seconds, but if we access webServer URL then its taking more than 9seconds.

Also, Each day I am seeing more and more of these in my error_logs, and when the Total Children value is reached 999 the Apache is not responding and Server reboot only help to bring the site back. Every day atleast 4-5 times we are facing this issue (we are using mod_jk to connect with tomcat).

Kindly please help on this.

Usually I am seeing this on my error_log:
[Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 4 idle, and 31 total children
[Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 20 total children
[Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 0 idle, and 28 total children
[Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 44 total children
We are using two Apache Nodes and Connected with Two Tomcat (at Application Level Clustering).
Apache Servers:
4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
Server version: Apache/2.2.21 (Unix)

httpd.conf
KeepAlive On
Timeout 300
MaxKeepAliveRequests 100
KeepAliveTimeout 15
<IfModule prefork.c>
StartServers 80
ServerLimit 3500
MaxClients 3500
MaxRequestsPerChild 0
</IfModule>

workers.properties
worker.list=loadbalancer,status
worker.qu_prod_live_svr.type=ajp13
worker.qu_prod_live_svr.host=cmsp1
worker.qu_prod_live_svr.port=8009
worker.qu_prod_live_svr.socket_keepalive=1
worker.qu_prod_live_svr.socket_timeout=300
worker.qu_prod_live_svr1.type=ajp13
worker.qu_prod_live_svr1.host=cmsp2
worker.qu_prod_live_svr1.port=8009
worker.qu_prod_live_svr1.socket_keepalive=1
worker.qu_prod_live_svr1.socket_timeout=300
worker.qu_prod_live_svr.lbfactor=1
worker.qu_prod_live_svr1.lbfactor=1
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1
worker.status.type=status

Tomcat Servers:
4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
Server version: Apache Tomcat/7.0.42
<Connector port="9090" protocol="HTTP/1.1" redirectPort="8443" URIEncoding="UTF-8" emptySessionPath="true" maxThreads="500" minSpareThreads="10" connectionTimeout="-1" />
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8" />

error_log:
[Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 839 total children
[Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 871 total children
[Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 903 total children
[Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 935 total children
[Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 967 total children
[Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 999 total children
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] Child 9351 returned a Fatal error... Apache is exiting!
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: apr_global_mutex_lock(jk_log_lock) failed
[Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file
[Sat Apr 15 21:53:06 2017][8752:4177577728] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1150): (qu_prod_live_svr1) can't receive the response header message from tomcat, network problems or tomcat (10.11.11.32:8009) is down (errno=104)\n failed: Broken pipe
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: apr_global_mutex_unlock(jk_log_lock) failed
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: apr_global_mutex_lock(jk_log_lock) failed
[Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file [Sat Apr 15 21:53:06 2017][8752:4177577728] [error] ajp_get_reply::jk_ajp_common.c (1962): (qu_prod_live_svr1) Tomcat is down or refused connection. No response has been sent to the client (yet)\n failed: Broken pipe
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: apr_global_mutex_unlock(jk_log_lock) failed

Thanks & Regards,
Jay