Re: [PATCH 1/2] userfaultfd: use RCU to free the task struct when fork fails

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 26 Mar 2019 09:56:43 +0100

On Mon 25-03-19 18:56:35, Andrea Arcangeli wrote:
> MEMCG depends on the task structure not to be freed under
> rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences
> mm->owner.

Please state the actual problem. Your cover letter mentiones a race
condition. Please make it explicit in the changelog.

> An alternate possible fix would be to defer the delivery of the
> userfaultfd contexts to the monitor until after fork() is guaranteed
> to succeed. Such a change would require more changes because it would
> create a strict ordering dependency where the uffd methods would need
> to be called beyond the last potentially failing branch in order to be
> safe.

How much more changes are we talking about? Because ...

> This solution as opposed only adds the dependency to common code
> to set mm->owner to NULL and to free the task struct that was pointed
> by mm->owner with RCU, if fork ends up failing. The userfaultfd
> methods can still be called anywhere during the fork runtime and the
> monitor will keep discarding orphaned "mm" coming from failed forks in
> userland.

... this is adding a subtle hack that might break in the future because
copy_process error paths are far from trivial and quite error prone
IMHO. I am not opposed to the patch in principle but I would really like
to see what kind of solutions we are comparing here.

> This race condition couldn't trigger if CONFIG_MEMCG was set =n at
> build time.

All the CONFIG_MEMCG is just ugly as hell. Can we reduce that please?
E.g. use if (IS_ENABLED(CONFIG_MEMCG)) where appropriate?

[...]

> +static __always_inline void mm_clear_owner(struct mm_struct *mm,
> +					   struct task_struct *p)
> +{
> +#ifdef CONFIG_MEMCG
> +	if (mm->owner == p)
> +		WRITE_ONCE(mm->owner, NULL);
> +#endif

How can we ever hit this warning and what does that mean?

-- 
Michal Hocko
SUSE Labs