Re: Conntrack not matching properly - producing serious outages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2011-08-11 at 17:30 -0400, John A. Sullivan III wrote: 
> On Thu, 2011-08-11 at 22:41 +0200, Jozsef Kadlecsik wrote:
> > On Thu, 11 Aug 2011, John A. Sullivan III wrote:
> > 
> > > I've just begun to wade my way through SACK as Jozsef suggested after
> > > getting some sleep but I was able to catch a live one with logging
> > > enabled:
> > > 
> > > Aug 11 11:56:24 fw01 kernel: nf_ct_tcp: bad TCP checksum IN= OUT=
> > > SRC=95.172.228.42 DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52
> > > ID=29203 DF PROTO=TCP SPT=46721 DPT=441 SEQ=2834861284 ACK=3682327577
> > > WINDOW=1002 RES=0x00 ACK PSH URGP=0 OPT (0101080A01249B0846B0F23B)
> > 
> > That's Noop, Noop and Timestamp options and not SACK.
> > 
> > But the TCP checksum checking in conntrack says that the TCP checksum of 
> > the received packet is invalid, therefore it assings the INVALID 
> > state to the packet.
> Ah, so we do suspect that this is the culprit?
> >  
> > > Aug 11 11:56:24 fw01 kernel: INPUT INVALID IN=bond3 OUT=
> > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42
> > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP
> > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0
> > > 
> > > Aug 11 11:56:24 fw01 kernel: No Match: IN=bond3 OUT=
> > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42
> > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP
> > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0
> > > 
> > > Is this telling me that the reason the packet has been classified as
> > > INVALID is because the TCP checksum is bad? We are doing checksum
> > > offloading so I would think the checksum in the packet evaluated by the
> > > kernel would be irrelevant.  We also have no problem if the users run
> > > their sessions through an OpenVPN tunnel.
> > 
> > TCP checksum offloading does not discard incoming packets with invalid 
> > checksum.
> Hmm . . . I wonder if we have a card which is going bad. This came on
> all of a sudden.  I was planning to disable offloading anyway to see if
> it solved the problem; I'm just awaiting a tester.  I'll report back
> what I find.  I certainly appreciate all the help - John
> >  
> > > I'll be digging into SACK next but wonder if I'm staring at the smoking
> > > gun and just don't recognize it.  I can try disabling offloading but not
> > > right now as the system is in heavy production.  Thanks - John
> > <snip>
Thanks to everyone for their help and my apologies for not getting back
sooner - we've been up almost continually battling this problem.

It looks like the netfilter involvement was a red herring.  We disabled
checksumming and the INVALID packet problem went away but the problem
persists.  We have hit and miss access and piles of duplicate ACKs and
retransmissions but it does not appear to be netfilter related.  Still
trying to figure out what changed of if we have some failing hardware.
Thanks again - John

--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux