On Fri, 2011-08-12 at 13:12 -0400, John A. Sullivan III wrote: > On Thu, 2011-08-11 at 17:30 -0400, John A. Sullivan III wrote: > > On Thu, 2011-08-11 at 22:41 +0200, Jozsef Kadlecsik wrote: > > > On Thu, 11 Aug 2011, John A. Sullivan III wrote: > > > > > > > I've just begun to wade my way through SACK as Jozsef suggested after > > > > getting some sleep but I was able to catch a live one with logging > > > > enabled: > > > > > > > > Aug 11 11:56:24 fw01 kernel: nf_ct_tcp: bad TCP checksum IN= OUT= > > > > SRC=95.172.228.42 DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 > > > > ID=29203 DF PROTO=TCP SPT=46721 DPT=441 SEQ=2834861284 ACK=3682327577 > > > > WINDOW=1002 RES=0x00 ACK PSH URGP=0 OPT (0101080A01249B0846B0F23B) > > > > > > That's Noop, Noop and Timestamp options and not SACK. > > > > > > But the TCP checksum checking in conntrack says that the TCP checksum of > > > the received packet is invalid, therefore it assings the INVALID > > > state to the packet. > > Ah, so we do suspect that this is the culprit? > > > > > > > Aug 11 11:56:24 fw01 kernel: INPUT INVALID IN=bond3 OUT= > > > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42 > > > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP > > > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0 > > > > > > > > Aug 11 11:56:24 fw01 kernel: No Match: IN=bond3 OUT= > > > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42 > > > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP > > > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0 > > > > > > > > Is this telling me that the reason the packet has been classified as > > > > INVALID is because the TCP checksum is bad? We are doing checksum > > > > offloading so I would think the checksum in the packet evaluated by the > > > > kernel would be irrelevant. We also have no problem if the users run > > > > their sessions through an OpenVPN tunnel. > > > > > > TCP checksum offloading does not discard incoming packets with invalid > > > checksum. > > Hmm . . . I wonder if we have a card which is going bad. This came on > > all of a sudden. I was planning to disable offloading anyway to see if > > it solved the problem; I'm just awaiting a tester. I'll report back > > what I find. I certainly appreciate all the help - John > > > > > > > I'll be digging into SACK next but wonder if I'm staring at the smoking > > > > gun and just don't recognize it. I can try disabling offloading but not > > > > right now as the system is in heavy production. Thanks - John > > > <snip> > Thanks to everyone for their help and my apologies for not getting back > sooner - we've been up almost continually battling this problem. > > It looks like the netfilter involvement was a red herring. We disabled > checksumming and the INVALID packet problem went away but the problem > persists. We have hit and miss access and piles of duplicate ACKs and > retransmissions but it does not appear to be netfilter related. Still > trying to figure out what changed of if we have some failing hardware. > Thanks again - John <snip> Looks like it might be a malfunctioning trunk port. That would explain the wild randomness of the problem. Thanks again, all. I certainly learned (and internally documented) a lot about troubleshooting conntrack with your help - John -- To unsubscribe from this list: send the line "unsubscribe netfilter" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html