On 04/02/2008 10:19:14 PM +0200, Brett Paden <paden@xxxxxxxxxxxx> wrote:
Using Leo's c test on a 2.4.20 kernel, I am __unable__ to create 3000ms
timeouts when doing localhost or interface connections to port 3306
(obviously with a running mysql server). Same results with my test.
You mean 2.6.20 don't you? 2.6 and 2.4 branches are way too different to
do any comparison...
About the tests, we need to focus only on what is relevant to our
problem , and always in the same situation. It's very difficult to
isolate problems and validate tests if everytime the test protocol
changes. If it is consistent, I think you may be able to reproduce the
bug with 2 servers, one receiving connections, and the other where we
test the different kernels and which is initiating the TCP connections.
*However*, if I run those sames tests against other ports I am able to
generate hangs.
can you describe precisely how you "run" those tests? For each port you
test, you need to have a server application listening on it. You could
for example change the port MySQL listens to. Is your MySQL server in a
production environment? If not, try to reboot to flush any connection
table before each test run.
I think it's very important to have precise and thorough test results
and protocols, then double check what we post in this thread if we want
to have people interested to help and not just thinking this is another
bogus thread about mysql config problems (and be ignored!).
Regardless of kernel, it appears that straight up connections to 3306
behave differently than other ports. If, for example, I generate 1000
connections very quickly to port 22 then run a netstat -na, I will see
loads of those connections sitting in TIME_WAIT. If I run the
identical test against port 3306 and do the same netstat I will see
none of those connections sitting in TIME_WAIT. I'm guessing the mysql
does something aggressive with connections to that port and is possibly
unrelated to our problem. Still, very interesting.
it's perfectly normal to have TIME_WAIT connections in your netstat.
MySQL probably set some TCP options like SO_REUSEADDR , this kind of
thing, to reuse the sockets instead letting them in TIME_WAIT state
(which is only useful on lossy networks).
Anyway, this last part is not relevant with the 3s delay bug.
When the bug happens, you see server-side a lot of half-opened TCP
connections in "SYN_RECV" state. You can also capture packets with
tcpdump and see results as Leo showed: one SYN packet from client to
server, one SYN/ACK packet from server to client (not processed by the
client thus not captured in non-promiscuous mode), then 3 seconds later
the same packets and a final client ACK, establishing the TCP session.
Gabriel
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html