On Mon, Apr 02, 2018 at 07:40:22PM +0900, Tetsuo Handa wrote: > That commit assumes that calling kill_sb() from deactivate_locked_super(s) > without corresponding fill_super() is safe. We have so far crashed with > rpc_mount() and kernfs_mount_ns(). Is that really safe? Consider the case when fill_super() returns an error immediately. It is exactly the same situation. And ->kill_sb() *is* called in cases when fill_super() has failed. Always had been - it's much less boilerplate that way. deactivate_locked_super() on that failure exit is the least painful variant, unfortunately. Filesystems with ->kill_sb() instances that rely upon something done between sget() and the first failure exit after it need to be fixed. And yes, that should've been spotted back then. Sorry. Fortunately, we don't have many of those - kill_{block,litter,anon}_super() are safe and those are the majority. Looking through the rest uncovers some bugs; so far all I've seen were already there. Note that normally we have something like static void affs_kill_sb(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); kill_block_super(sb); if (sbi) { affs_free_bitmap(sb); affs_brelse(sbi->s_root_bh); kfree(sbi->s_prefix); mutex_destroy(&sbi->s_bmlock); kfree(sbi); } } which basically does one of the safe ones augmented with something that takes care *not* to assume that e.g. ->s_fs_info has been allocated. Not everyone does, though: jffs2_fill_super(): c = kzalloc(sizeof(*c), GFP_KERNEL); if (!c) return -ENOMEM; in the very beginning. So we can return from it with NULL ->s_fs_info. Now, consider struct jffs2_sb_info *c = JFFS2_SB_INFO(sb); if (!(sb->s_flags & MS_RDONLY)) jffs2_stop_garbage_collect_thread(c); in jffs2_kill_sb() and void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c) { int wait = 0; spin_lock(&c->erase_completion_lock); if (c->gc_task) { IOW, fail that kzalloc() (or, indeed, an allocation in register_shrinker()) and eat an oops. Always had been there, always hard to hit without fault injectors and fortunately trivial to fix. Similar in nfs_kill_super() calling nfs_free_server(). Similar in v9fs_kill_super() with v9fs_session_cancel()/v9fs_session_close() calls. Similar in hypfs_kill_super(), afs_kill_super(), btrfs_kill_super(), cifs_kill_sb() (all trivial to fix) Aha... nfsd_umount() is a new regression. orangefs: old, trivial to fix. cgroup_kill_sb(): old, hopefully easy to fix. Note that kernfs_root_from_sb() can bloody well return NULL, making cgroup_root_from_kf() oops. Always had been there. AFAICS, after discarding the instances that do the right thing we are left with: hypfs_kill_super, rdt_kill_sb, v9fs_kill_super, afs_kill_super, btrfs_kill_super, cifs_kill_sb, jffs2_kill_sb, nfs_kill_super, nfsd_umount, orangefs_kill_sb, proc_kill_sb, sysfs_kill_sb, cgroup_kill_sb, rpc_kill_sb. Out of those, nfsd_umount(), proc_kill_sb() and rpc_kill_sb() are regressions. So are rdt_kill_sb() and sysfs_kill_sb() (victims of the issue you've spotted in kernfs_kill_sb()). The rest are old (and I wonder if syzbot had been catching those - they are also dependent upon a specific allocation failing at the right time).