Re: [PATCH] doc: Document that kernel may accept unimplemented expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9.4.2022 13.22, Florian Westphal wrote:
Topi Miettinen <toiwoton@xxxxxxxxx> wrote:
Note that the kernel may accept expressions without errors even if it
doesn't implement the feature. For example, input chain filters using
*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
expressions are silently accepted but they don't work yet, except when used
with *tproxy* rules, early demultiplexing or BPF programs.

This is what iptables-extensions(8) says:

IMPORTANT:  when  being  used in the INPUT chain, the cgroup matcher is currently only of limited functionality, meaning it will only match on packets that are processed for local sockets through early socket demuxing. Therefore, general usage on the INPUT chain is not advised unless the implications
are well understood.

For nftables, this is true for all meta types that use skb->sk internally,
such as skuid, skgid, cgroup, ...

Could you please explain this 'early demux' concept? Is this something that
can be triggered with NFT rules, kernel configuration etc? I can't find any
documentation.

sysctl.
net.ipv4.ip_early_demux = 1
net.ipv4.tcp_early_demux = 1
net.ipv4.udp_early_demux = 1

This is a performance optimization, tcp edemux only works for
sockets in established state, udp demux has restrictions as well.

So, no guarantee that this will set the socket reliably, hence the
paragraph in the iptables-extensions manpage.


Thanks. From this blog post I suppose the problem is that NFT rules aren't checked after final demux:
https://www.privateinternetaccess.com/blog/linux-networking-stack-from-the-ground-up-part-4-2/

Would it be possible to add such checks in the future?

What about:

Note that the kernel may accept expressions without errors even if it
doesn't implement the feature. For example, input chain filters using
expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or
*socket cgroupv2* are silently accepted but they don't work reliably
yet, except when used with *tproxy* rules, early demultiplexing
(available only for TCP for sockets in established state and UDP demux
has restrictions as well) or BPF programs.

-Topi



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux