SV: Conntrack insertion race conditions -- any workarounds?

André Paulsberg-Csibi (IBM Consultant) <Andre.Paulsberg-Csibi@xxxxxxxx> · Thu, 6 Sep 2018 13:19:36 +0000

I am not sure I agree that this is a race condition , but I might be wrong here .

Based on what I assume is normal UDP behavior I would think 2 request generated for one A and second AAAA record should have 2 separate sources ports ,
and should result in 2 separate conntrack entries and as such not race each other for any entry .
( this is my understanding , correct me if this assumption is incorrect and tcpdumps actually show same UDP source port is used )

Best regards
André Paulsberg-Csibi
Senior Network Engineer 
IBM Services AS

Sensitivity: Internal

-----Opprinnelig melding-----
Fra: netfilter-owner@xxxxxxxxxxxxxxx <netfilter-owner@xxxxxxxxxxxxxxx> På vegne av Kyle Larose
Sendt: torsdag 6. september 2018 15.01
Til: netfilter@xxxxxxxxxxxxxxx
Emne: Conntrack insertion race conditions -- any workarounds?

Hello,

I have been using nfqueue in conjunction with conntrack to monitor/police flows on containers in my kubernetes cluster. This worked until I started pushing UDP traffic through my nfqueue service.
At that point, I began to experience issues with DNS queries -- they would take forever!

In particular, I noticed that two queries would come out almost in
parallel: an A and a AAAA query. The AAAA would almost always get dropped after going through nfqueue. After proving to myself that my service wasn't at fault, I started digging, and came across a few posts discussing the issue. For example, see [cut link]

Basically, I'm running into an issue within conntrack whereby two packets with the same connection tuple race to enter the table. The loser is dropped. I confirmed that I was hitting this condition by checking the conntrack stats, which show "insert_failed" and "drop"
increasing every time the condition occurs. The counters do not increase otherwise.

Right now I am running Ubuntu 18.04, with its stock kernel: 4.15.0-32-generic.

I understand that a few fixes for this issue are in progress, or have been merged into the kernel. However, I do not have control over the kernel I will be running, so there is a good chance that any fixes will not be in place.

I'm wondering if anyone has suggestions for workarounds I could put in place? The most promising one I saw involved using tc to place a delay on AAAA packets. However, I could not get that to work -- my service runs on traffic *leaving* the container, meaning that a rule on egress from the interface is too late. I could not figure out how to force traffic entering from a local process to hit tc prior to going through conntrack. I'm also concerned that other UDP services may hit the same issue if they exhibit similar traffic patterns.

Some thoughts I have right now are:
1. Add a "delay" queue where my service delays the AAAA packets prior to punting them to the main queue.
2. Don't use conntrack at all (which will really hurt performance -- I don't need to see every packet) 3. Use something other than nfqueue (does anyone have suggestions for alternatives which would allow me to see the L3 contents of packets inline, and possibly decide on them?) 4. ???

Any help is greatly appreciated.

Thanks!

Kyle