On 6/23/19 9:02 PM, Roman Gushchin wrote: > On Sun, Jun 23, 2019 at 08:29:21PM -0700, Alexei Starovoitov wrote: >> On 6/23/19 7:30 PM, Roman Gushchin wrote: >>> Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf >>> from cgroup itself"), cgroup_bpf release occurs asynchronously >>> (from a worker context), and before the release of the cgroup itself. >>> >>> This introduced a previously non-existing race between the release >>> and update paths. E.g. if a leaf's cgroup_bpf is released and a new >>> bpf program is attached to the one of ancestor cgroups at the same >>> time. The race may result in double-free and other memory corruptions. >>> >>> To fix the problem, let's protect the body of cgroup_bpf_release() >>> with cgroup_mutex, as it was effectively previously, when all this >>> code was called from the cgroup release path with cgroup mutex held. >>> >>> Also make sure, that we don't leave already freed pointers to the >>> effective prog arrays. Otherwise, they can be released again by >>> the update path. It wasn't necessary before, because previously >>> the update path couldn't see such a cgroup, as cgroup_bpf and cgroup >>> itself were released together. >> >> I thought dying cgroup won't have any children cgroups ? > > It's not completely true, a dying cgroup can't have living children. > >> It should have been empty with no tasks inside it? > > Right. > >> Only some resources are still held? > > Right. > >> mutex and zero init are highly suspicious. >> It feels that cgroup_bpf_release is called too early. > > An alternative solution is to bump the refcounter on > every update path, and explicitly skip de-bpf'ed cgroups. > >> >> Thinking from another angle... if child cgroups can still attach then >> this bpf_release is broken. > > Hm, what do you mean under attach? It's not possible to attach > a new prog, but if a prog is attached to a parent cgroup, > a pointer can spill through "effective" array. > > But I agree, it's broken. Update path should ignore such > cgroups (cgroups, which cgroup_bpf was released). I'll take a look. > >> The code should be >> calling __cgroup_bpf_detach() one by one to make sure >> update_effective_progs() is called, since descendant are still >> sort-of alive and can attach? > > Not sure I get you. Dying cgroup is a leaf cgroup. > >> >> My money is on 'too early'. >> May be cgroup is not dying ? >> Just cgroup_sk_free() is called on the last socket and >> this auto-detach logic got triggered incorrectly? > > So, once again, what's my picture: > > A > A/B > A/B/C > > cpu1: cpu2: > rmdir C attach new prog to A > C got dying update A, update B, update C... > C's cgroup_bpf is released C's effective progs is replaced with new one > old is double freed > > It looks like it can be reproduced without any sockets. I see. Does it mean that css_for_each_descendant walks dying cgroups ? I guess the fix then is to avoid walking them in update_effective_progs ?