Re: Most optimal method to dump UDP conntrack entries

Antonio Ojea <antonio.ojea.garcia@xxxxxxxxx> · Thu, 14 Nov 2024 21:11:34 -0700

On Tue, 12 Nov 2024 at 09:18, Florian Westphal <fw@xxxxxxxxx> wrote:
>
> Antonio Ojea <antonio.ojea.garcia@xxxxxxxxx> wrote:
> > On Tue, 12 Nov 2024 at 02:20, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Nov 12, 2024 at 10:16:45AM +0100, Pablo Neira Ayuso wrote:
> > > > I guess the concern is that assured flows cannot be expelled from the
> > > > conntrack table via early_drop, that is why an expedite cleanup is
> > > > important?
> > >
> > > Actually, the issue is that packets could end up in a backend which
> > > does not exist after re-configuration, therefore, removing the entry
> > > need to happen so ongoing flow have a chance to talk to another
> > > (different) backend.
> >
> > Please take a look to this kselftest attached that emulates the
> > problematic behavior in kubernetes,
> >
> > I think that in UDP the nat rule should take precedence over the
> > conntrack entry,on the contrary to TCP where it is important to
> > preserve the session if it has been established.
>
> Why? The peer is even alive in your test; from your initial description
> I thought this was about 'peer stops responding, but udp conntrack
> remains alive forever due to client-probes'.
>
> This is just silly, we can't make a change to auto-toss all mappings
> on a nat rule change.
>
> What do you do when someone uses random sampling and refreshes the
> mapping table?
>
> Kernel doesn't know what kind of upper layer protocol is used, what
> if its a stateful protocol that breaks when you packets get steered
> somewhere else mid-stream?
>
> Did you evaluate use of stateless NAT for your use case?  That would
> follow rules 1:1 and thus break depending on the protocol expectations,
> or not.
>
> For insanity like this I think we really can't do anything except offer
> an efficient conntrack table flush mechanism to avoid any loop in
> userspace.

I recognize that I put a very extreme case and without the kubernetes
context is confusing, because the availability of the backends in
Kubernetes is represented by other high level APIs, my apologies.

Let's forget about this use case please, and let me try to redo the
test to represent the one that the backend stops replying, that is
still very useful.