On Mon 25-03-19 18:56:35, Andrea Arcangeli wrote: > MEMCG depends on the task structure not to be freed under > rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences > mm->owner. Please state the actual problem. Your cover letter mentiones a race condition. Please make it explicit in the changelog. > An alternate possible fix would be to defer the delivery of the > userfaultfd contexts to the monitor until after fork() is guaranteed > to succeed. Such a change would require more changes because it would > create a strict ordering dependency where the uffd methods would need > to be called beyond the last potentially failing branch in order to be > safe. How much more changes are we talking about? Because ... > This solution as opposed only adds the dependency to common code > to set mm->owner to NULL and to free the task struct that was pointed > by mm->owner with RCU, if fork ends up failing. The userfaultfd > methods can still be called anywhere during the fork runtime and the > monitor will keep discarding orphaned "mm" coming from failed forks in > userland. ... this is adding a subtle hack that might break in the future because copy_process error paths are far from trivial and quite error prone IMHO. I am not opposed to the patch in principle but I would really like to see what kind of solutions we are comparing here. > This race condition couldn't trigger if CONFIG_MEMCG was set =n at > build time. All the CONFIG_MEMCG is just ugly as hell. Can we reduce that please? E.g. use if (IS_ENABLED(CONFIG_MEMCG)) where appropriate? [...] > +static __always_inline void mm_clear_owner(struct mm_struct *mm, > + struct task_struct *p) > +{ > +#ifdef CONFIG_MEMCG > + if (mm->owner == p) > + WRITE_ONCE(mm->owner, NULL); > +#endif How can we ever hit this warning and what does that mean? -- Michal Hocko SUSE Labs