Re: Help troubleshooting performance issue, after "1000 total children" Apache no longer responds to HTTP requests. Not MaxClients issue?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, PJ.

Perhaps your prefork settings are the cause of the issue.

Look, you have 80 StartServers and 120 MaxSpareServers, and with such settings, apache can spawn 9600 (80*120) children.

However, your ServerLimit and MaxClients (3500) are way to lower than that.

I've had similar issues when the number of children apache could spawn were higher than the ServerLimit/MaxClients value.

Try raising the ServerLimit and MaxClients value to 9600 (make sure you have enough memory to do so) and check what happens. 

In case you can't afford such high number of children, lower the value of StartServers and MaxSpareServers but keep it equivalent to MaxClients and ServerLimit.

Hope this helps.


Luis Alen

On Wed, May 2, 2012 at 11:38 AM, P J <pauljflists@xxxxxxxxx> wrote:
On Tue, May 1, 2012 at 7:26 AM, P J <pauljflists@xxxxxxxxx> wrote:
On Tue, May 1, 2012 at 7:22 AM, P J <pauljflists@xxxxxxxxx> wrote:
On Mon, Apr 30, 2012 at 10:37 AM, P J <pauljflists@xxxxxxxxx> wrote:

On Mon, Apr 30, 2012 at 9:13 AM, Alexandr Normuradov <normalex@xxxxxxxxx> wrote:
cat /proc/$(pidof -s httpd)/limitsTo troubleshoot that you should have
at least two additional outputs from

netstat -pant, with connections states
and
service httpd fullstatus, listing current state of all the apache procs/threads.

What applications your Apache is serving?
PHP? is it mod_php, mod_python, mod_perl?

What the vhost access log file for the most accessed vhost is showing?
Any pattern of slow, connections  consuming attack?
If it is, and all tasks are in the Keep Alive wait then disable Keep
Alive and lower the general timeout to just 7 seconds.

The error "connect to listener on [::]:80" error is quite unusual.

ETIMEDOUT
   Timeout while attempting connection. The server may be too busy to
accept new connections. Note that for IP sockets the timeout may be
very long when syncookies are enabled on the server.

cat /proc/sys/fs/file-nr

cat /proc/$(pidof -s httpd)/limits


Sincerely,
Alexandr Normalex

Hi Alexandr, thanks for taking a look at this with me.

The traffic pattern for this website is at certain times of the day it receives huge spikes of traffic in very short periods of time, trying to tune Apache to accommodate it the best we can.

cat /proc/$(pidof -s httpd)/limits

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             55296                55296                processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       55296                55296                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0     

cat /proc/sys/fs/file-nr
1530    0       560543

Looking at Max open files I see what is likely the problem :)
Max open files            1024

I swear I modified this to 4096! I've changed the limit to 4096 now, I'll double check it tomorrow. Hopefully this will be the obvious fix!

I will check service httpd fullstatus  and netstat -pant tomorrow morning when this happens again, it happens the same time every day - it is not an attack, the customers application receives massive amounts of connections at certain times of the day.

I've been working with Apache for 15 years and I've never seen "connect to listener on [::]:80" error message before, I hope it's related to reaching Max open files.

Thanks again for your help.

--
PJ


I was hoping this would be fixed now that Max Open files has been updated, same issue this morning.

cat /proc/$(pidof -s httpd)/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             55296                55296                processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       55296                55296                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0    

Once it reaches 1000 total children

[info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 17 idle, and 1002 total children

After 1000 total children

mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80
mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80
mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80

Until apache is restarted.

I tried to run service httpd fullstatus during this time but it want able to connect:

ELinks: Connection refused.

I did capture the output of netstat -pant which shows many connections to the MySQL DB as well.
I've double checked MySQL has not reached max connections and that it's still working during this time.

netstat output is so big I have to put it up on pastebin:

I dont understand why this is happening at 1000 children, what limit is it hitting?

Apache config:

Timeout 30

KeepAlive On
MaxKeepAliveRequests 10000
KeepAliveTimeout 3

<IfModule prefork.c>
StartServers      80
MinSpareServers   50
MaxSpareServers  120
ServerLimit     3500
MaxClients      3500
MaxRequestsPerChild  4000
</IfModule


Any help would be greatly appreciated.

--
PJ


Haha, Max open files still says 1024!! I hardcoded it to 16384 yesterday, something keeps resetting it!

Let me figure this out before I keep bugging the list :)

Thanks,

--
PJ


Same issue this morning:

[Wed May 02 07:01:57 2012] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 48 idle, and 1004 total children

[Wed May 02 07:02:16 2012] [debug] mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80
[Wed May 02 07:02:23 2012] [debug] mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80
[Wed May 02 07:02:30 2012] [debug] mpm_common.c(663): (70007)The timeout specified has expired: connect to listener on [::]:80

--snip--

And the site was down.

I've confirmed the Max open files setting has been fixed:

Max open files            16384                16384                files

Anyone else have any insight on what the "(70007)The timeout specified has expired: connect to listener on [::]:80" error is and why it happens every day after reaching 1000 children?

Not sure where else to look.

Thanks in advance.

--
PJ


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux