On Tue, Apr 1, 2008 at 8:43 PM, Gabriel Barazer <gabriel@xxxxxxxx> wrote: > On 04/01/2008 8:28:14 PM +0200, "H. Willstrand" <h.willstrand@xxxxxxxxx> > > > wrote: > > On Tue, Apr 1, 2008 at 7:59 PM, Gabriel Barazer <gabriel@xxxxxxxx> wrote: > >> On 04/01/2008 7:17:31 PM +0200, Leo <neleo@xxxxxxx> wrote: > >> > H. Willstrand wrote: > >> >> On Tue, Apr 1, 2008 at 5:43 PM, Gabriel Barazer <gabriel@xxxxxxxx> wrote: > >> >> > >> >>> On 04/01/2008 4:43:20 PM +0200, Brett Paden <paden@xxxxxxxxxxxx> wrote: > >> >>> >> If I'm right Brett's problem relays in the test client (provided in > >> >>> >> the first mail). This has probably to do with the number of ports > >> >>> >> opened and closed during a short time period. > >> >>> > > >> >>> > My test client is designed to simulate the sort of load our > >> >>> production > >> >>> > databases and web servers see. We're talking on the order of 100-400 > >> >>> > connections per second. On an unloaded server the 3000ms occur right > >> >>> > around 400 connections a second but we have seen them a lower > >> >>> connection > >> >>> > rates. Are you suggesting that we could do something simple (like > >> >>> reap > >> >>> > TIME_WAIT connections) to allevaite the problem? > >> >>> > >> >>> Using tcp_tw_recycle / tcp_tw_reuse doesn't solve the problem either on > >> >>> the client nor on the server. I tested with and without these options > >> >>> enabled, disabled netfilter's connection tracking and none solved this > >> >>> delay. If even the "lo" interface is concerned, there is definitely > >> >>> something into the network stack and not the device drivers. > >> >>> > >> >>> Here is a thread I started on LKML about this very same bug. > >> >>> http://lkml.org/lkml/2008/3/14/353 > >> >>> There is a forum thread with french hosting providers talking about it. > >> >>> (if some of you read french: > >> >>> http://www.webmasterclub.fr/forum/topic,59486,0.html) > >> >>> > >> >>> We are far from being alone! > >> >>> > >> > Welcome to the club, Gabriel! > >> >>> Gabriel > >> > >> How lucky I am! > >> I suspect there are many other people having this problem out there, > >> they just don't notice these delays on small infrastructures and because > >> this bug doesn't actually cause a connection error, but "only" an > >> unacceptable delay for moderate to high busy servers. > >> > >> > >> >> Ok, seams to be the same issue that Leo has (has nothing to do with > >> >> the Brett / Marlon issue, only common dominator is the 3000ms). > >> >> > >> > But Gabriel is also talking about 3 second timeouts on the client as > >> > Brett and I did. I have read Gabriel's description on the provided link > >> > and it seems to be exactly the same problem. I think Brett can confirm > >> > this ... > >> >> This issue is probably caused by server delivering as miscalculated > >> >> SYN/ACK (the acked number is miscalculated, see my second mail). > >> >> > >> > When you look at my first tcpdump with two machines as server and client > >> > then you can see that there are no miscalculated SYN/ACK packets from > >> > the server (and therefore no RST packet from the client). All packets > >> > have the right number but the client never receives the SYN/ACK packet > >> > from the server. Only at the lo test there are RST packets and wrong > >> > packet numbers. But as I told you in my last email I think this is a > >> > different problem and not important for us. We should ignore the lo test > >> > and concentrate on the "real" problem of Brett, Gabriel and myself (and > >> > even a lot of other people out there). > >> > >> I confirm that there is no problem is the sequence numbers. Attached is > >> the pcap compatible capture of the relevant packets (608 bytes, 6 > >> packets total: 2 for the failed handshake, 3 for the successful one and > >> 1 for the first mysql data packet). This capture has been filtered to > >> show only the relevant packets and done in promiscuous mode. > >> > > > > > I'm missing the tcpdump... > > Sorry, I forgot to include it when reformatting my e-mail. Here it is! > > Gabriel > The packages are OK. Still, how did you produce this situation? Let me guess, you used one client to mass produce connections to your mysql-server, right? //HW -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html