On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote: > > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > > > > > Yeah, I have at least namei.o > > > > > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > > > > > *grumble* > > > > > > Is it reproducible without KASAN? Would be much easier to follow the produced > > > asm... > > > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. > > nd->inode == NULL. > > Yeah, I already saw that. > > > > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in > > link_path_walk() and trying to reproduce that oops? > > Yep, no problem. If you run the reproducer in a loop for a little while > you eventually trigger the BUG_ON() and then you get the following splat > (and then an endless loop) in [1] with nd->inode NULL. > > _But_ I managed to debug this further and was able to trigger the BUG_ON() > directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0) > with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2]. > So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the > PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.). So we find current->fs->pwd.dentry negative, with current->fs->seq sampled equal before and after that? Lovely... The only places where we assign anything to ->pwd.dentry are void set_fs_pwd(struct fs_struct *fs, const struct path *path) { struct path old_pwd; path_get(path); spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); old_pwd = fs->pwd; fs->pwd = *path; write_seqcount_end(&fs->seq); spin_unlock(&fs->lock); if (old_pwd.dentry) path_put(&old_pwd); } where we have ->seq bumped between dget new/assignment/ dput old, copy_fs_struct() where we have spin_lock(&old->lock); fs->root = old->root; path_get(&fs->root); fs->pwd = old->pwd; path_get(&fs->pwd); spin_unlock(&old->lock); fs being freshly allocated instance that couldn't have been observed by anyone and chroot_fs_refs(), where we have spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); hits += replace_path(&fs->root, old_root, new_root); hits += replace_path(&fs->pwd, old_root, new_root); write_seqcount_end(&fs->seq); while (hits--) { count++; path_get(new_root); } spin_unlock(&fs->lock); ... static inline int replace_path(struct path *p, const struct path *old, const struct path *new) { if (likely(p->dentry != old->dentry || p->mnt != old->mnt)) return 0; *p = *new; return 1; } Here we have new_root->dentry pinned from the very beginning, and assignments are wrapped into bumps of ->seq. Moreover, we are holding ->lock through that sequence (as all writers do), so these references can't be dropped before path_get() bumps new_root->dentry refcount. chroot_fs_refs() is called only by pivot_root(2): chroot_fs_refs(&root, &new); and there new is set by error = user_path_at(AT_FDCWD, new_root, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new); if (error) goto out0; which pins new.dentry *and* verifies that it's positive and a directory, at that. Since pinned positive dentry can't be made negative by anybody else, we know it will remain in that state until path_put(&new); well downstream of chroot_fs_refs(). In copy_fs_struct() we are copying someone's ->pwd, so it's also pinned positive. And it won't be dropped outside of old->lock, so by the time somebody manages to drop the reference in old, path_get() effects will be visible (old->lock serving as a barrier). That leaves set_fs_pwd() calls: fs/init.c:54: set_fs_pwd(current->fs, &path); init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive. fs/namespace.c:4207: set_fs_pwd(current->fs, &root); init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned positive (and it would've oopsed much earlier had that been it) fs/namespace.c:4485: set_fs_pwd(fs, &root); mntns_install(), root filled by successful LOOKUP_DOWN for "/" from mnt_ns->root. Should be pinned positive. fs/open.c:501: set_fs_pwd(current->fs, &path); chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive. fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path); fchdir(2), file->f_path of any opened file. Pinned positive. kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd); umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive. kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd); commit_nsset(). Let's see what's going on there... if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) { set_fs_root(me->fs, &nsset->fs->root); set_fs_pwd(me->fs, &nsset->fs->pwd); } In those conditions nsset.fs has come from copy_fs_struct() done in prepare_nsset(). And the only thing that might've been done to it would be those set_fs_pwd() in mntns_install() (I'm not fond of the entire nsset->fs thing - looks like papering over bad calling conventions, but anyway) Now, I might've missed some insanity (direct assignments to ->pwd.dentry, etc. - wouldn't be the first time io_uring folks went "layering? wassat? we'll just poke in whatever we can reach"), but I don't see anything obvious of that sort in the area... OK, how about this: in path_init(), right after do { seq = read_seqcount_begin(&fs->seq); nd->path = fs->pwd; nd->inode = nd->path.dentry->d_inode; nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); } while (read_seqcount_retry(&fs->seq, seq)); slap if (!nd->inode) { // should never, ever happen struct dentry *fucked = nd->path.dentry; printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked), fucked->d_flags, fs, fs->users, seq); BUG_ON(1); return ERR_PTR(-EINVAL); } and see what it catches?