ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: > Adding Rune Kleveland to the discussion as he also seems to have > reproduced the issue. > > Alex and I have been starring at the code and the reports and this > bug is hiding well. Here is what we have figured out so far. > > Both the warning from free_user_ns calling dec_ucount that Jordan Glover > reported and the KASAN error that Yu Zhao has reported appear to have > the same cause. Using a ucounts structure after it has been freed and > reallocated as something else. > > I have just skimmed through the recent report from Rune Kleveland > and it appears also to be a use after free. Especially since the > second failure in the log is slub complaining about trying to free > the ucounts data structure. > > We looked through the users of put_ucounts and we don't see any obvious > buggy users that would be freeing the data structure early. > > Alex has tried to reproduce this so far is not having any luck. > Folks can you tell what compiler versions you are using and share your > kernel config with us? That might help. > > The little debug diff below is my guess of what is happening. If the > folks who can reproduce this issue can try the patch below and let me > know if the warnings fire that would be appreciated. It is still not > enough to track down the bug but at least it will confirm my current > hypothesis about how things look before there is a use of memory after > it is freed. Bah. Scratch that test patch. I just double checked myself and cred->ucounts and cred->user_ns->ucounts should never be equal, as the user namespace is counted in it's parent user namespace. That observation now tells me I have a parent user namespace that went corrupt. Back to the drawing board. > Thank you, > Eric > > diff --git a/kernel/cred.c b/kernel/cred.c > index f784e08c2fbd..e7ffaa3cf5a6 100644 > --- a/kernel/cred.c > +++ b/kernel/cred.c > @@ -120,6 +120,12 @@ static void put_cred_rcu(struct rcu_head *rcu) > if (cred->group_info) > put_group_info(cred->group_info); > free_uid(cred->user); > +#if 1 > + if ((cred->ucounts == cred->user_ns->ucounts) && > + (atomic_read(&cred->ucounts->count) == 1)) { > + WARN_ONCE(1, "put_cred_rcu: ucount count 1\n"); > + } > +#endif > if (cred->ucounts) > put_ucounts(cred->ucounts); > put_user_ns(cred->user_ns); > diff --git a/kernel/exit.c b/kernel/exit.c > index 91a43e57a32e..60fd88b34c1a 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -743,6 +743,13 @@ void __noreturn do_exit(long code) > if (unlikely(!tsk->pid)) > panic("Attempted to kill the idle task!"); > > +#if 1 > + if ((tsk->cred->ucounts == tsk->cred->user_ns->ucounts) && > + (atomic_read(tsk->cred->ucounts->count) == 1)) { > + WARN_ONCE(1, "do_exit: ucount count 1\n"); > + } > +#endif > + > /* > * If do_exit is called because this processes oopsed, it's possible > * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before