We have been observing strange problems with Apache 2.0 at one of our client sites for many months now. A bit of background: we sell specialized server software that is currently running on hundreds of machines at scores of customer sites. Apache is used to serve up static content as well as act as a front-end to Tomcat via mod_jk. The problem I'm about to describe has only been observed at one customer site, and the most obvious unique property of this site is Gigabit Ethernet to every server and client workstation. The server in question is running Red Hat Linux 6.2 (kernel 2.2.14-5.0smp) with Apache 2.0.52. Client workstations run various flavours of Windows, but I have observed the problem when running my test client on other Linux machines. Here's what happens: a small fraction of HTTP responses are truncated before the entire response body has been sent. (The fraction seems to vary from 1/20 to 1/10,000 depending on who's doing the testing, what hardware is involved, phase of moon, etc.) To diagnose the problem, I wrote a Python program that implements this algorithm: for i = 1 .. M: open connection to <host> for j = 1 .. N: send request for <uri> read Content-Length read response body ensure number of bytes read == Content-Length close connection So if all goes well, this requests the same file M*N times. Failures always seem to come in pairs. The first one looks like this: FAIL: read 110228 bytes (expected 131072) in 15.0 sec (7.2 kB/sec) >From packet-sniffing (tcpdump on the server, ethereal on the client), I've determined that the sequence of events for this failure is: * client sends request: "GET" + headers * server starts response "200 OK" + headers + first chunk of body * server sends most of body (1460 bytes per TCP segment), with a steady stream of TCP ACK segments from the client * server "freezes" for 15 sec and then sends a TCP FIN ACK segment -- i.e. the connection is closed by the server * client gets end-of-file on next read and reports failure (bytes read != Content-Length) The second failure appears to be an unavoidable consequence of the first one: httplib.py (the standard Python HTTP client library) attempts to read a response line from a closed socket and barfs: FAIL: HTTP error Traceback (most recent call last): File "./httptest.py", line 93, in run_connection File "./httptest.py", line 111, in send_request File "/usr/lib/python2.3/httplib.py", line 779, in getresponse File "/usr/lib/python2.3/httplib.py", line 273, in begin File "/usr/lib/python2.3/httplib.py", line 237, in _read_status BadStatusLine Then the client falls out to the outer loop, catches the exception, opens a new connection, and carries on quite happily. >From client workstations running Windows, I would say that rather more than 1/100 requests fail. (Or perhaps I should say 2/100, since failures come in pairs *with my test client* -- other HTTP clients might do a better job of detecting a closed connection and opening a new one automatically.) >From other server machines (also Linux boxes), failures seem to be more on the order of 1/10,000 requests. That's based on two runs of 10,000 requests each, so hardly scientific. This could be a question of network hardware, network topology, device drivers, OS TCP stack, ... who knows. I'm pretty sure it's *not* the HTTP client library, though, since we have observed failures in Java programs, in C++ programs (using wininet as the client library), and in my Python test program. One more data point: the failure does not appear to happen with other HTTP servers. I wrote a trivial single-threaded HTTP 1.0 server in Python, and we have not seen failures with it. And many months ago we experimented with connecting to Tomcat directly instead of going through Apache, and the failures disappeared. (There are various good reasons to keep the dual Apache/Tomcat setup: SSL, CGI, mod_rewrite, ...) Oh yeah, one more thing: this problem only started appearing when we upgraded to Apache 2.0 (in order to use Tomcat). Until about 18 months ago, this server was running Apache 1.3 with JServ, and we never had a problem. (Apart from JServ being a pain in the neck. ;-) So... has anyone else witnessed weird problems with Apache 2.0 over gigabit networks? My gut instinct says it's not *just* Apache, and not *just* the hardware, and not *just* Linux, and not *just* Windows, but some combination of those or various other factors. Maybe Apache is tickling the hardware (or the kernel) in a way that exposes bugs? Any ideas are welcome! Greg --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx
![]() |