Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 8 Sep 2020 14:55:36 +0200

Hi Lukas,

On 9/5/20 7:24 AM, Lukas Wunner wrote:
On Fri, Sep 04, 2020 at 11:14:37PM +0200, Daniel Borkmann wrote:
On 9/4/20 6:21 PM, Lukas Wunner wrote:
[...]
The tc queueing layer which is below is not the tc egress hook; the
latter is for filtering/mangling/forwarding or helping the lower tc
queueing layer to classify.

People want to apply netfilter rules on egress, so either we need an
egress hook in the xmit path or we'd have to teach tc to filter and
mangle based on netfilter rules.  The former seemed more straight-forward
to me but I'm happy to pursue other directions.

I would strongly prefer something where nf integrates into existing tc hook,
not only due to the hook reuse which would be better, but also to allow for a
more flexible interaction between tc/BPF use cases and nf, to name one
example... consider two different entities in the system setting up the two, that
is, one adding rules for nf ingress/egress on the phys device for host fw and
the other one for routing traffic into/from containers at the tc layer,
then traffic going into host ns will hit nf ingress and on egress side the
nf egress part; however, traffic going to containers via existing tc redirect
will not see the nf ingress as expected but would on reverse path incorrectly
hit the nf egress one which is /not/ the case for dev_queue_xmit() today. So
there would need to be more flexible coordination between the two so these
subsystems don't step on each other and the orchestration system can flexibly
arrange those needs depending on the use case. Conceptually the tc/nf
ingress/egress hook would be the same anyway in the sense that we have
some sort of a list or array with callbacks performing actions on the skb,
passing on, dropping or forwarding, so this should be consolidated where
both can register into an array of callbacks as processing pipeline that
can be atomically swapped at runtime, and then similar as with tc or LSMs
allow to delegate or terminate the processing in a generic way.

[...]
the case is rather if distros start adding DHCP
filtering rules by default there as per your main motivation then
everyone needs to pay this price, which is completely unreasonable
to perform in __dev_queue_xmit().

So first you're saying that the patches are unnecessary and everything
they do can be achieved with tc... and then you're saying distros are
going to use the nft hook to filter DHCP by default, which will cost
performance.  That seems contradictory.  Why aren't distros using tc
today to filter DHCP?

Again, I'm not sure why you ask me, you guys brought up lack of DHCP filtering
as why this hook is needed. My gut feeling why it is not there today, because the
use case was not strong enough to do it on nf or tc layer that anyone cared to fix
it over the last few decades (!). And if you check a typical DHCP client that is
present on major modern distros like systemd-networkd's DHCP client then they
already implement filtering of malicious packets via BPF at socket layer including
checking for cookies in the DHCP header that are set by the application itself to
prevent spoofing [0].

Thanks,
Daniel

  [0] https://github.com/systemd/systemd/blob/master/src/libsystemd-network/dhcp-network.c#L28