Re: [PATCH bpf v2 1/3] bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode

Tejun Heo <tj@xxxxxxxxxx> · Thu, 9 Sep 2021 12:29:54 -1000

Hello,

On Thu, Sep 09, 2021 at 10:43:40PM +0200, Daniel Borkmann wrote:
...
> Generally, this mutual exclusiveness does not hold anymore in today's user
> environments and makes cgroup v2 usage from BPF side fragile and unreliable.
> This fix adds proper struct cgroup pointer for the cgroup v2 case to struct
> sock_cgroup_data in order to address these issues; this implicitly also fixes
> the tradeoffs being made back then with regards to races and refcount leaks
> as stated in bd1060a1d671, and removes the fallback, so that cgroup v2 BPF
> programs always operate as expected.
> 
>   [0] https://github.com/nestybox/sysbox/
>   [1] https://kind.sigs.k8s.io/
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> Cc: David S. Miller <davem@xxxxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Martynas Pumputis <m@xxxxxxxxx>
> Cc: Stanislav Fomichev <sdf@xxxxxxxxxx>

While this does increase cgroup's footprint inside sock, I think it's worth
considering the following points:

1. It's clear now that we won't need more cgroup related socket fields for
   network integration. cgroup2 membership tagging has proven flexible
   enough especially in combination with bpf.

2. Users have been transitioning from cgroup1 to cgroup2, some gradually,
   which is why this multiplexing is becoming an issue. In time, as
   transtions progress further, we should be able to disable cgroup1 network
   controllers for many use cases.

For the series,

Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Thanks.

-- 
tejun