David Ahern <dsahern@xxxxxxxxx> wrote: > On 10/19/21 5:49 AM, Florian Westphal wrote: > > David Ahern <dsahern@xxxxxxxxx> wrote: > >> Thanks for the detailed summary and possible solutions. > >> > >> NAT/MASQ rules with VRF were not really thought about during > >> development; it was not a use case (or use cases) Cumulus or other NOS > >> vendors cared about. Community users were popping up fairly early and > >> patches would get sent, but no real thought about how to handle both > >> sets of rules - VRF device and port devices. > >> > >> What about adding an attribute on the VRF device to declare which side > >> to take -- rules against the port device or rules against the VRF device > >> and control the nf resets based on it? > > > > This would need a way to suppress the NF_HOOK invocation from the > > normal IP path. Any idea on how to do that? AFAICS there is no way to > > get to the vrf device at that point, so no way to detect the toggle. > > > > Or did you mean to only suppress the 2nd conntrack round? > > My thought was that the newly inserted nf_reset_ct fixed one use case > and breaks another, so the new attribute would control that call. Right, but the 'new nf_reset_ct' are there to undo the 2nd nat transformation done on round 2. So, no round 2, no second nat transformation & no need for the new nf_ct_reset(). I dislike the idea of treating locally originating flows different from forwarded ones. Treating them the same causes asymmetry of ingress&egress, i.e. ingress means 'traverse conntrack for lower device' whereas egress means 'traverse conntrack via vrf device'. I could hack the nat core & the conntrack commit hook to skip functionality if the outdev is a vrf device -- that should in theory result in consistent semantics, i.e. conntrack only runs in lower device context. I'll give that a shot unless someone has a better idea.