mpm_winnt, websockets and restarts: increasing number of (blocked?) threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Short version:

I use httpd on Windows as a reverse proxy for a microservice system. Some services communicate over websockets (more precicely: SignalR). From time to time I have to restart the server in order to read a new configuration. I observe an increasing number of threads blocked by the SignalR connections. It's a matter of time until the server completely freezes because no threads are available for other requests.

Details:

I reduced my system as much as possible. I end up with two microservices, A and B. A has a SignalR hub. Both, A and B subscribe to the events of this hub. Thus, there should be two connections.

Now the experiment:

1. Start the two microservices: They repeatedly try to connect, but fail. This is expected, because they are configured to connect via the reverse proxy and httpd is not running yet.
2. Start httpd (Windows Service): As expected, both services establish their connection, confirmed by the service logs and mod_status showing 2 connections.
3. Restart httpd: In real-world, I call
    httpd.exe -n "ServiceName" -k restart
   programmatically. For this experiment, I call it from Powershell. What happens?
   3a. The parent starts a new child and hands over 2 sockets, see error.log on Pastebin
   3b. The parent needs to stop the old child. The old child cannot stop because of the open connections. The old child waits a grace period of 30s before, then it terminates the 2 threads. My services log that their connection was disconnected and attempt to reconnect. At this moment, 2 more connections appear in mod_status. However, I don't see any socket handover in error.log.
4. Repeat httpd restart.
   4a. The parent starts a new child and hands over 2 sockets, see error.log. It's still 2 sockets, although I saw 4 connections in mod_status in the previous step.
   4b. The parent shuts down the old child. This time, there is no grace period, but 18(!) threads that failed to exit are terminated, see error.log. Both services log disconnect and reconnect. However, no additional connections appear in mod_stats, it remains 4.
   
When I repeat restarting httpd, most of the time it happens the same as described in step 4. Only difference is a changing number of "threads that failed to exit". But sometimes, additional connections appear in mod_status. I can't reproduce this on purpose. I suspect a race condition how fast the old child is shut down, the new one is started and my services trying to reconnect, but I don't know the httpd source code.


To get my job done, I need to know: What can I do to avoid eventually blocking the server?
Out of curiosity, I also would like to know what excatly happens, how the SignalR connectios are handed over to the next child, why the first restart works different than the other restarts.  
   
I appreciate any hint!


Some more information about server and configuration:
Version: 2.4.41
Some config snippets:

ThreadsPerChild 20 # handy for debugging, not in production

RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule "^/my/microservice" "wss://hostname:53728%{REQUEST_URI}"[P]
ProxyPass /my/microservice https://hostname:53728/my/microservice
ProxyPassReverse /my/microservice https://hostname:53728/my/microservice

Link to error.log on Pastebin: https://pastebin.com/7a7B0bLb

[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux