Bizarre problem with Apache HTTPD, a number of Tomcats, mod_proxy_balancer and mod_jk - any ideas where to look for the root cause welcome

"Jürgen Göres" <jguni@xxxxxx> · Wed, 18 Mar 2020 13:40:57 +0100

Hi all,

we are currently observing a really bizarre problem on a customer system.
Our software runs a number of microservices on individual Tomcats, which we front with an Apache HTTPD (2.4.x) reverse proxy using mod_jk to route the requests by context. There is one exception, though: one of the microservices which we added to the stack at a later point in time uses websocksets, which are not supported through the AJP protocol, so we are using mod_proxy_balancer here.
We put the ProxyPass etc. rules for mod_proxy_balancer in front of the directives related to mod_jk and we have been mostly fine with this approach for a few years now. We have two sets of balancer specifications for mod_proxy_balancer and their associated rules, one for regular http traffic, the other for websocket traffic ("ws:" resp. "wss:").

Let's name the microservices that are handled by mod_jk A, B, and C,  and let's name the one handled by mod_proxy_balancer Z. Let's further assume that their request contexts are /a, /b, /c and /z, respectively.

Now about the current customer problem: the customer started experiencing very erratic system behaviour. In particular requests that were meant for one of the microservices A-C handled by mod_jk would randomly give 404 responses. Usually, this situation would persist for an affected user for a few seconds and reloading wouldn't resolve it. At the same time, other users accessing the very same microservice didn't have a problem. Pretty much all users were affected from time to time.

We did several troubleshooting sessions that turned up nothing. At some point, we started to monitor all kinds of traffic between HTTPD and the Tomcats with TCPdump, and here we found the bizarre thing:
When we ran TCP dump and filtered it to only show traffic between HTTPD and the microservice Z (handled by mod_proxy_balancer), we sometimes saw requests that were clearly meant for one of the OTHER microservices (A-C) based on the request URL (a, /b, /c) that would show up in the traffic to the microservice Z, and naturally microservice Z has no idea of what to do with these requests and responds with 404.

What else might be relevant:
- our microservices are stateless, so we an scale horizontally if we want. On that particular system, we have at least two instances of each microservice (A-C and Z)
- the installation is spread across multiple nodes
- the nodes run on Linux
- Docker is not used ;-)
- we have never seen this problem on any other system
- we haven't seen this problem on the customer's test system, but here usage patterns are different
- the requests with 404 responses wouldn't show up in the HTTPD's access log (where "normal" 404 requests DO show).
- the customer had recently updated from a version of our product that uses Apache 2.4.34 to one using 2.4.41
- disabling the microservice Z (= no more balancer workers for mod_proxy_balancer) would resolve the problem
- putting the rules for mod_proxy_balancer after those of mod_jk (and adding an exclusion for /z there, cause on of the other microservices is actually listening on the root context) would NOT change a thing

>From experience, we are pretty sure that the problem is somewhere on our side. ;-)

- One thing we thought is that maybe a bug in microservice Z that is only triggered by this customer's use of our product causes the erratic behaviour of the HTTPD/MPB? Maybe something we do wrong messing up the connection keepalive between Apache and Tomcat, causing requests to go the wrong way?
- Or maybe it is related to the Apache version update (2.4.34 to 2.4.41)? But why are other installations with the same version not affected?

Any ideas where we should start looking?

Regards

J

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx