On 8/9/2016 12:07 PM, Jacob Champion wrote:
I think I've finally had some success finding a reproduction of this issue, though it's somewhat involved. I set up an instance of Apache 2.4.16 64-bit (built from source) on a Windows 7 machine and spun up an instance of WANem (http://wanem.sourceforge.net/) in a VirtualBox VM hosted on my client machine (also Windows 7).At this point, my primary suspect is our use of recycled OVERLAPPED structs without reinitializing them to zero. To make matters worse, we're setting the OVERLAPPED's internal .Pointer field in the AcceptFilter 'data' case -- which we're not supposed to be doing to begin with [1]. We don't do that in the 'connect' filter. This is all just theorycrafting, though. I'll try to reproduce on my end too. --Jacob [1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms684342(v=vs.85).aspx (the Members > Pointer section)
WANem configuration (Advanced Mode): Bandwidth - 100Mbps Random Disconnect Type - tcp-reset Random Disconnect MTTF Low - 1 Random Disconnect MTTF High - 3 Random Disconnect MTTR Low - 0 Random Disconnect MTTR High - 0This instructs WANem to inject a TCP RST into connections that pass through it every 1 to 3 seconds (then recover after 0 seconds).
Then on my client machine, I added a route to the server that passes through the WANem gateway (cmd prompt: ROUTE ADD <server-ip> <WANem-ip>).
Finally, I ran a program on the client that makes 10 cURL requests in parallel repeatedly, performing a GET on a simple index.html page (well, a 28 KB HTML page). Eventually, even requests made to localhost on the server machine stop responding (they hang until the client times out).
Nothing shows in the error logs (I tried up to debug verbosity), and once it reproduces, no more entries appear in the access logs. I have to restart the server, though I haven't tried letting it sit for a period of time to see if it recovers on its own.
When I do the whole process again with "AcceptFilter http connect", it does not reproduce, and requests continue to work (when not being reset by WANem).
Not easy to set up, but at least it doesn't involve a browser or specific content on the server. I've seen it reproduce almost immediately, but it usually does so within 10 seconds or so.
I'll see if Wireshark shows anything interesting going on around the RSTs. -- Paul Spangler LabVIEW R&D National Instruments --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx