Re: Commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1r "vrf: Reset skb conntrack connection on VRF rcv" breaks expected netfilter behaviour

Florian Westphal <fw@xxxxxxxxx> · Wed, 13 Oct 2021 14:58:27 +0200

Lahav Schlesinger <lschlesinger@xxxxxxxxxxxxx> wrote:
> The call to nf_reset_ct() I added was to match the existing call in the
> egress flow, which I didn't want to change in order to not break
> existing behaviour (which I unintentionally still did :-)).
> 
> Seems like any combination of calling nf_reset_ct() will lead to
> something breaking. So continuing on what Florian suggested, another
> possibility is to make the calls to nf_reset_ct() in both ingress and egress
> flow configurable (procfs or new flags to RTM_NEWLINK).
> 
> One benefit of this is that disabling nf_reset_ct() on the egress flow will
> mean no port SNAT will take place when SNAT rule is installed on a VRF
> (as I described in my original commit), which can break applications
> that depend on using a specific source port.

Looking at the original change, eb63ecc1706b3e094d0f57438b6c2067cfc299f2
"net: vrf: Drop conntrack data after pass through VRF device on Tx",
I wonder if thats not the real cause of the problem.

=========================
Locally originated traffic in a VRF fails in the presence of a POSTROUTING
rule. For example,

$ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j
MASQUERADE
$ ping -I red -c1 11.1.1.3
ping: Warning: source address might
be selected on device other than red.
PING 11.1.1.3 (11.1.1.3)
from 11.1.1.2 red: 56(84) bytes of data.
ping: sendmsg: Operation not permitted
=========================

I think we first need selftest scripts that re-creates the three scenarios
the one reported by Eugene, the one outlined above and the double-PAT one Lahav
fixed before any code changes are tested.

Its tempting to just change the nf_ct_reset() done on egress to be
conditional on the ct->status snat bit & drop support for double-snat.

Given Lahavs patch, double-snat probably never worked to begin with?