Apache 2.2.4 on Fedora Core 5 Linux - Large file downloads timing out

<myles@xxxxxxxxxxx> · Wed, 16 May 2007 07:57:05 -0700

I have a very strange situation.  We have 4 colocated servers - 2 Windows
and 2 Linux.  One of the Linux servers provides a web application that
serves large files (about 20-40mb in size).  Recently, as activity is
starting to increase on this box, I am getting reports from users that their
file downloads 'stall' or timeout about 50% through the download process.
Its not consistent when they stall, but about 1 in 4 downloads appear to be
affected by this.

To test it, I ran some basic large file downloads from Apache and was able
to re-create the issue.  I tested this on BOTH Linux boxes, and it behaved
the exact same way on both systems.  This problem appears to have been
introduced on the server in the last 3 months or so which could coincide
with new kernel updates on the boxes or increased traffic.  I've tested with
2.6.18 and 2.6.19 kernels and both exhibit the identical behavior.

In order to rule out our switch or colocation provider, I ran the same tests
on Windows.  The Windows boxes have Apache 1.x on them.  They ran absolutely
fine.  No timeouts.  But their download speeds are significantly slower than
the Linux boxes.  I am not sure why this is the case, but speed isn't as
critical an issue for us as reliability is.  Consequently I am currently
downloading from the Windows boxes until this issue is resolved.

I installed Lighttpd just to test this on the Linux boxes.  It ALSO had the
same problem as Apache.  Timeouts at various times during downloads.  This
lead to me to believe the problem is in the TCP/IP network configuration on
those boxes, or something that has been introduced into recent Linux updates
that affects web server performance with large file downloads.

Using stat, I was able to capture some of the activity that occurred during
the timeouts.  Here is what I found:

poll([{fd=21, events=POLLIN, revents=POLLIN}], 1, 500000) = 1
read(21, "", 8000)                      = 0
gettimeofday({1179254838, 419949}, NULL) = 0
shutdown(21, 1 /* send */)              = 0
poll([{fd=21, events=POLLIN, revents=POLLIN|POLLHUP}], 1, 2000) = 1
read(21, "", 512)                       = 0
close(21)                               = 0
read(6, 0xbfa01283, 1)                  = -1 EAGAIN (Resource temporarily
unavai
lable)
semop(163841, 0xcb570c, 1 <unfinished ...>

My Apache settings to do with this are:

    EnableSendFile Off
    EnableMMAP Off
    KeepAlive On
    KeepAliveTImeout 400
    MaxKeepAliveRequests 400

However on a different Linux box with only defaults for this set, gives the
identical timeout behavior so these settings don't seem to have any affect
on this issue.

Now it would appear that my issues are not specifically related to Apache 2
but are affected by the use of Apache 2 (since I can duplicate the same
behavior with Lighttpd).  What I am hoping is that someone out there is also
using their Linux box for large file downloads and can give me some
information on how I should set up my servers to increase reliability and
decrease timeouts for this.

All suggestions, comments or offers of help are greatly appreciated.

Regards,
Myles

==============================
Myles Wakeham
Director of Engineering
Tech Solutions USA, Inc.
Scottsdale, Arizona USA
www.techsol.org

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx