I have a very strange situation. We have 4 colocated servers - 2 Windows and 2 Linux. One of the Linux servers provides a web application that serves large files (about 20-40mb in size). Recently, as activity is starting to increase on this box, I am getting reports from users that their file downloads 'stall' or timeout about 50% through the download process. Its not consistent when they stall, but about 1 in 4 downloads appear to be affected by this. To test it, I ran some basic large file downloads from Apache and was able to re-create the issue. I tested this on BOTH Linux boxes, and it behaved the exact same way on both systems. This problem appears to have been introduced on the server in the last 3 months or so which could coincide with new kernel updates on the boxes or increased traffic. I've tested with 2.6.18 and 2.6.19 kernels and both exhibit the identical behavior. In order to rule out our switch or colocation provider, I ran the same tests on Windows. The Windows boxes have Apache 1.x on them. They ran absolutely fine. No timeouts. But their download speeds are significantly slower than the Linux boxes. I am not sure why this is the case, but speed isn't as critical an issue for us as reliability is. Consequently I am currently downloading from the Windows boxes until this issue is resolved. I installed Lighttpd just to test this on the Linux boxes. It ALSO had the same problem as Apache. Timeouts at various times during downloads. This lead to me to believe the problem is in the TCP/IP network configuration on those boxes, or something that has been introduced into recent Linux updates that affects web server performance with large file downloads. Using stat, I was able to capture some of the activity that occurred during the timeouts. Here is what I found: poll([{fd=21, events=POLLIN, revents=POLLIN}], 1, 500000) = 1 read(21, "", 8000) = 0 gettimeofday({1179254838, 419949}, NULL) = 0 shutdown(21, 1 /* send */) = 0 poll([{fd=21, events=POLLIN, revents=POLLIN|POLLHUP}], 1, 2000) = 1 read(21, "", 512) = 0 close(21) = 0 read(6, 0xbfa01283, 1) = -1 EAGAIN (Resource temporarily unavai lable) semop(163841, 0xcb570c, 1 <unfinished ...> My Apache settings to do with this are: EnableSendFile Off EnableMMAP Off KeepAlive On KeepAliveTImeout 400 MaxKeepAliveRequests 400 However on a different Linux box with only defaults for this set, gives the identical timeout behavior so these settings don't seem to have any affect on this issue. Now it would appear that my issues are not specifically related to Apache 2 but are affected by the use of Apache 2 (since I can duplicate the same behavior with Lighttpd). What I am hoping is that someone out there is also using their Linux box for large file downloads and can give me some information on how I should set up my servers to increase reliability and decrease timeouts for this. All suggestions, comments or offers of help are greatly appreciated. Regards, Myles ============================== Myles Wakeham Director of Engineering Tech Solutions USA, Inc. Scottsdale, Arizona USA www.techsol.org --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx