Hello, I'm having a problem which is hard to isolate and hard to reproduce. I'm trying to give you as much information as possible. First some version details: CentOS release 3.9 (Final) Linux 2.4.21-53.ELsmp #1 SMP Mon Dec 3 13:34:41 EST 2007 i686 i686 i386 GNU/Linux Server: Apache/2.0.46 (CentOS) The situation is the following: We have an apache that, among other things, serves podcasts to users. The podcasts are just normal MP3 files on an NFS share, apache is configured with that NFS share as DocumentRoot for the appropriate virtual host. Up to several million requests are served per week. The problem is, some users get aborts, the file transfer just finishes in the middle of the file. It happens often enough to get complaints from users every day, but we weren't able to reproduce it no matter how hard we hammer our web server (using, for example, jmeter or a wget-loop). It's also hard to identify those aborts that actually have been reported by the users, because we have a lot of requests and a lot of them are deliberately aborted by the users. However, using a special debug log, an own special sniffer and several other methods I believe I've been able to isolate some of those webserver file transfer aborts. Let's look at two of these aborts in the debug log, which has the following format (mod_logio enabled): CustomLog /var/log/apache/download-debug_log "size:%B sent:%O time:%D conn:%X file:%f" status:200 size:2859008 sent:59188 duration:15523375 conn:X file:/some/podcast.mp3 status:200 size:2859008 sent:59188 duration:15515576 conn:X file:/some/podcast.mp3 (nothing in the error log) I omitted some fields from the actual debug log in order to keep the information relevant and for privacy issues. Both entries are HTTP/1.1, are for the same file, have the same local and remote IP address, User-Agent is IE7 and they are about 20 minutes apart. As you can see, apache (correctly) detects that the filesize is 2859008 bytes, but sent only 59188 bytes (including status, headers and everything). Strangely, this happens at the exact same byte for requests which are 20 minutes apart, thus some kind of hardware glitch that is killing processes or similar seems unlikely. How do I know that it's really apache that is terminating the connection? Well, I'm not exactly sure, but at least I know it's on the server side, not on the client side, because I was able to record the first few and last few packets of podcast downloads. Here are the last few packets of the first request: www > cli: P 282541947:282543399(1452) ack 935053929 win 6432 cli > www: . ack 282539043 win 17424 www > cli: . 282543399:282544851(1452) ack 935053929 win 6432 www > cli: . 282544851:282546303(1452) ack 935053929 win 6432 www > cli: . 282546303:282547755(1452) ack 935053929 win 6432 cli > www: . ack 282540495 win 17424 www > cli: . 282547755:282549207(1452) ack 935053929 win 6432 www > cli: . 282549207:282550659(1452) ack 935053929 win 6432 cli > www: . ack 282543399 win 17424 www > cli: . 282550659:282552111(1452) ack 935053929 win 6432 www > cli: . 282552111:282553563(1452) ack 935053929 win 6432 www > cli: FP 282553563:282555015(1452) ack 935053929 win 6432 cli > www: . ack 282546303 win 17424 cli > www: . ack 282547755 win 17424 cli > www: . ack 282550659 win 17424 cli > www: . ack 282552111 win 17424 cli > www: . ack 282555016 win 17424 cli > www: F 935053929:935053929(0) ack 282555016 win 17424 www > cli: . ack 935053930 win 6432 I omitted timestamps and IP adresses (nothing suspicious there, all packets are well within one second). The last few packets of the second request look *exactly* the same, only with different sequence numbers of course. Browser request and server response were as follows, extracted from the first few packets: GET <filename omitted> HTTP/1.1 Accept: */* UA-CPU: x86 Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) Connection: Keep-Alive Host: <omitted> HTTP/1.1 200 OK Date: Wed, 21 May 2008 18:22:32 GMT Server: Apache/2.0.46 (CentOS) Last-Modified: Tue, 20 May 2008 14:49:20 GMT ETag: "471c70-2ba000-95cf1c00" Accept-Ranges: bytes Content-Length: 2859008 Connection: close Content-Type: audio/mpeg X-Pad: avoid browser bug So, nothing special there, too, and apache DOES report a correct Content-Length, but fails to deliver the whole file before closing the connection. Has anybody any idea what the problem could be, or at least what I could do to further approach to its core, get more debug information etc.? The server is very busy and we weren't able to reproduce the problem at will, so things like strace are pretty useless without any kind of filter. It was difficult enough to get the packet captures. The problem is especially annoying, because at least users of IE7 aren't able to redownload a broken file. The reason is simple: though IE7 should know that the downloaded file was incomplete (the Content-Length header is correct), it ignores this fact and sends a conditional GET with If-Modified-Since header on subsequent requests. Apache of course reports with a status 304 that, no, the file hasn't been changed and won't resend it. This is visible in the debug log because in the twenty minutes between the two aforementioned requests, there are a lot of 304 responses for that IP/file combination. I think this behaviour of IE7 is strange and annoying (on subsequent tries, it really should try to inconditionally GET the file if the Content-Length didn't match the amount of data transfered, or do a partial GET for the remainder), but this isn't really the issue here. Thanks a lot for your help and let me know if you need any more details! Best Regards, Julien --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx