On 28.3.2022 18.05, Pablo Neira Ayuso wrote:
On Mon, Mar 28, 2022 at 05:08:32PM +0300, Topi Miettinen wrote:
On 28.3.2022 0.31, Pablo Neira Ayuso wrote:
On Sat, Mar 26, 2022 at 12:09:26PM +0200, Topi Miettinen wrote:
[...]
Another possibility would be to hook into cgroup directory creation logic in
kernel so that when the cgroup is created, part of the path checks are
performed or something else which would allow non-existent cgroups to be
used. Then the NFT syntax would not need changing, but the expressions would
"just work" even when loaded early.
Could you use inotify/dnotify/eventfd to track these updates from
userspace and update the nftables sets accordingly? AFAIK, this is
available to cgroupsv2.
It's possible, there's for example:
https://github.com/mk-fg/systemd-cgroup-nftables-policy-manager
This one seems to be adding one rule per cgroupv2, it would be better
to use a map for this purpose for scalability reasons.
https://github.com/helsinki-systems/nft_cgroupv2/
This approach above takes us back to the linear ruleset evaluation
problem, this is basically looking like iptables, this does not scale up.
But I think that with this approach, depending on system load, there could
be a vulnerable time window where the rules aren't loaded yet but the
process which is supposed to be protected by the rules has already started
running. This isn't desirable for firewalls, so I'd like to have a way for
loading the firewall rules as early as possible.
You could define a static ruleset which creates the table, basechain
and the cgroupv2 verdict map. Then, systemd updates this map with new
entries to match on cgroupsv2 and apply the corresponding policy for
this process, and delete it when not needed anymore. You have to
define one non-basechain for each cgroupv2 policy.
So something like this:
table inet x {
map dict {
type string : verdict;
}
chain y {
socket cgroupv2 level 4 vmap @dict
}
}
and then systemd would add an entry like {
"app-local\x2dfirefox\x2desr-01d5fcc2f9114e509e992cdaef3d84c3.scope" :
accept } to the vmap "dict" when realizing the cgroup?
-Topi
To address the vulnerable time window, the static ruleset defines a
default policy to allow nothing until an explicit policy based on
cgroupv2 for this process is in place.
The cgroupv2 support for nftables was designed to be used with maps.