On Wed, Mar 14, 2012 at 10:58 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote: >> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote: >> > This patch fixes the assumption that a mnt namespace will always have a valid >> > root object. >> >> It's not an assumption, it's an invariant that should hold unless you have >> run into a bug somewhere. >> >> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns(). >> There are only two callers - dup_mnt_namespace() and create_mnt_ns(). >> The latter will assign non-NULL vfsmount to ->root or die NULL pointer >> dereference in >> mnt->mnt_ns = new_ns; >> The former will either assign non-NULL to ->root or kfree() mnt_namespace >> before anyone can see it. >> >> And nothing should modify ->root after that assignment for as long as >> the instance of struct mnt_namespace is allocated. >> >> Mind explaining how have you managed to get mnt_namespace with NULL ->root >> passed to dup_mnt_ns()? > > As the matter of fact, all of the above is easily verified; very few places > see the internals of struct mnt_namespace (defined in fs/mount.h, included > only in fs/dcache.c, fs/fhandle.c, fs/namei.c, > fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c}, > and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and > fs/proc_namespace.c). Nothing else knows what size the damn thing is, > nevermind the actual layout. > > The lifetime rules are also simple and easy to verify: it's a plain refcount. > No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns(). > As far as the outside cares, > nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always > returns a new reference, either to exisiting instance or to freshly allocated > one. put_mnt_ns() needs to be called on the other side... > opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount > for as long as it's opened. > > This is it; we actually end up acquiring one reference too many to initial > mnt_namespace, but that's not going to cause such effect. > > The structure containing struct vfsmount has a pointer to struct mnt_namespace > in it. That one is protected by namespace_sem, does *not* contribute to > refcount and is cleared when vfsmount (OK, struct mount containing it) > gets removed from namespace's ->list and set when it gets placed there. > Anything still on mnt_ns->list will get kicked out of there by umount_tree() > call from put_mnt_ns(), so those references can't outlive the instance > they point to. > > And that's all pointers to mnt_namespace that ever exist, aside of function > arguments and local variables. I'm not saying that I couldn't have possibly > fucked it up, but from rereading that code it doesn't look like we could > end up with dangling pointers to already freed instances... I'm trying to find the exact chain of events leading it it at the moment, but it reproduces rather easily - so if you have any ideas on figuring it out I'm happy to try anything. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html