"apr_thread_create: unable to create worker thread" messages with an overloaded apache2 unter systemd

Erik Wasser <ewasser@xxxxxxxxxxxxx> · Sun, 28 Oct 2018 15:49:56 +0100

Hi everyone,

I'm stumbled upon some strange issues with the apache2. At first some 

facts about the environment there the apache2 was running:

* Virtuozzo 7 with a stripped down version of Ubuntu 18.04 as container OS.
* The thread/processes limit is set to 1500 for the container.
* The apache2 is using the `mpm_worker_module` with this configuration:

<IfModule mpm_worker_module>
    ServerLimit              16
    StartServers             3
    MinSpareThreads          75
    MaxSpareThreads          250
    ThreadLimit              64
    ThreadsPerChild          25
    MaxRequestWorkers        400
    MaxConnectionsPerChild   10000
</IfModule>

* I'm using apache2-2.4.29-1ubuntu4.4.

* I was dealing with some kind of benchmarking and I used this command 

`ab -kc 1000 -t 60 http://foo.invalid/` for it. The returned page was a 

simple static HTML page with not FastCGI/PHP or any other "fancy" stuff.

I was expecting some delay with the benchmarks because the number of 

connections `1000` were larger than the upper thread limit of the 

`mpm_worker_module`. This should not a problem at all because apache2 

shows some good error messages if it runs into some process/thread limit.

But in this case, this is what I found in the `error.log` (removed the 

timestamp for readability):

… [mpm_worker:alert] [pid 2835:tid 140106655919040] (11)Resource 

temporarily unavailable: AH00282: apr_thread_create: unable to create 

worker thread

… [mpm_worker:alert] [pid 2898:tid 140106655919040] (11)Resource 

temporarily unavailable: AH00282: apr_thread_create: unable to create 

worker thread

… [mpm_worker:alert] [pid 2938:tid 140106655919040] (11)Resource 

temporarily unavailable: AH00282: apr_thread_create: unable to create 

worker thread

Or

… [mpm_worker:crit] [pid 25242:tid 140005187725056] (22)Invalid 

argument: AH03139: ap_queue_pop failed

… [mpm_worker:crit] [pid 25242:tid 140005187725056] (22)Invalid 

argument: AH03139: ap_queue_pop failed

… [mpm_worker:crit] [pid 25242:tid 140005187725056] (22)Invalid 

argument: AH03139: ap_queue_pop failed

… [mpm_worker:alert] [pid 25396:tid 140005187929856] (11)Resource 

temporarily unavailable: AH03142: apr_thread_create: unable to create 

worker thread

Or

… [core:error] [pid 401:tid 140005310155712] AH00546: no record of 

generation 0 of exiting child 25241

… [core:error] [pid 401:tid 140005310155712] AH00546: no record of 

generation 0 of exiting child 25396

And no other message.

Sometimes the apache2 hangs with 100% CPU usage even after the ab 

testing was long over. Here the line from `htop`:

> 5875 www-data  20  0  372M  3340  1068 S 198.  0.1  0:31.16 

/usr/sbin/apache2 -k start

The load was going up to 20 even without any further requests. I've to 

restart the apache2 to fix this.

`strace` during the ab run shows this:

2835  clone(child_stack=0x7f6d09becf70, 

flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 

parent_tidptr=0x7f6d09bed9d0, tls=0x7f6d09bed700, 

child_tidptr=0x7f6d09bed9d0) = -1 EAGAIN (Resource temporarily unavailable)

2835  clone(child_stack=0x7f6d09becf70, 

flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 

parent_tidptr=0x7f6d09bed9d0, tls=0x7f6d09bed700, 

child_tidptr=0x7f6d09bed9d0) = -1 EAGAIN (Resource temporarily unavailable)

2835  clone(child_stack=0x7f6d09becf70, 

flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 

parent_tidptr=0x7f6d09bed9d0, tls=0x7f6d09bed700, 

child_tidptr=0x7f6d09bed9d0) = -1 EAGAIN (Resource temporarily unavailable)

So someone was holding back the apache2 from creating new threads. To 

make a long story short: It was systemd that limited the total process 

of the apache2 cgroup to 225 threads:

$ systemctl status apache2 | grep Tasks
    Tasks: 194 (limit: 225)

Setting the systemd value `DefaultTasksMax=infinity` and everything was 

fine but 2 questions are still remaining:

1. The error message shows that apache2 is not ready to handle 

successfully a fail of the clone() call. Is this a bug?

2. Can apache2 detect a task limit via cgroups on it's own to detect 

this kind of misconfiguration? In this case the max configured threads 

were larger than the limit of the cgroup. Is it possible to detect this 

and issue a warning to the logfiles?

Any thoughts?

--
Greetings
Erik Wasser

maxcluster GmbH
Technologiepark 8
DE-33100 Paderborn

Web : maxcluster.de
Tel.: +49 5251 414130
Fax : +49 5251 4141399

Sitz der Gesellschaft : Paderborn
Geschäftsführung : Sebastian Ringel, Alexander Wilhelm Handelsregister :
Amtsgericht Paderborn HRB 10453

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx