On Sat, Apr 12, 2014 at 03:15:39PM -0700, Eric W. Biederman wrote: > Can you explain which scenario you are thinking about with respect to a > failed modprobe? Completely made up example: static struct file_system_type foofs = { .mount = mount_foo, .kill_sb = kill_foo, }; static struct vfsmount *mnt; static __init int foo_init(void) { int err; err = init_some(); if (err < 0) return err; mnt = kern_mount(&foofs); if (IS_ERR(mnt)) { uninit_some(); return PTR_ERR(mnt); } err = init_some_more(); if (err < 0) { kern_umount(mnt); uninit_some(); return err; } printk(KERN_INFO "loaded foo"); return 0; } Now, think what happens if init_some_more() in the above fails. With the current mntput() semantics, everything works. After making mntput() (from kern_umount()) delayed until the return to userland, we end up with attempt to call kill_foo() after the memory where it code sits gets freed. For that matter, by that point we are not even guaranteed to reach it, since it comes as mnt->mnt_sb->s_type->kill_sb() and s_type points to freed memory. I'm not saying that we have something that would closely resemble this example, but it's not hard to vary it in a lot of ways, keeping the same problem. Basically, you need to audit all paths leading from failure exits in some module_init() to mntput() and figure out if delaying the effect of that mntput() would be safe there (== doesn't get delayed past the point where we destroy something needed for that fs shutdown). It's not *that* horrible, since not too many modules out there are declaring any fs types, but it needs to be done. In theory, you could also fall prey to something like this: type = get_fs_type("proc"); ns = kmalloc(...); /* fill *ns */ mnt = kern_mount_data(type, p); ... if (error) { kern_unmount(mnt); kfree(p); put_filesystem(type); } possibly with get_fs_type() replaced with some other way to get that pointer to fs type (defined elsewhere). E.g. for procfs it could be, say, task_active_pid_ns(current)->proc_mnt->mnt_sb->s_type, etc. Again, it's not impossible to audit (there's not a lot of places where struct file_system_type * is ever stored, there are few instances of struct file_system_type, all statically allocated, etc.), but it's a non-trivial amount of work. And I honestly don't know if we have any such places right now. Moreover, unless you feel like repeating that kind of audit every merge window, we'll need a some way of dealing with such situations. Something like flush_pending_mntput(fs_type), for example, documented as barrier to be used in such places might do, but if you can think of something more fool-proof... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html