Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Wed, Apr 09, 2014 at 03:58:25PM -0700, Eric W. Biederman wrote: >> >> mntput as part of pathput is called from all over the vfs sometimes as >> in the case of symlink chasing from some rather deep call chains. >> During filesystem unmount with the right set of races those innocuous >> little mntput calls that take very little stack space can become calls >> become mosters calling deactivate_super that can take up 3k or more of >> stack space as syncrhonous filesystem I/O is performed, through >> multiple levels of the I/O stack. >> >> Avoid deactivate_super being called from a deep stack by converting >> mntput to use task_work_add when the mnt_count goes to 0. The >> filesystem is still unmounted synchronously preserving the semantics >> that system calls like umount require. > > Careful. For one thing, you've just introduced a massive leak in knfsd > and any other kernel thread that might do mntput(). task_work_add() > makes no sense there - there is no userland to return to. For another, > in things like cleanup of failing modprobe we might end up delaying fs > shutdown too much. So it's not that simple, unfortunately. Unfortunately. > I agree that fs shutdown is better dealt with on mostly empty stack, of > course - moreover, done right that has a potential to make mntput() > safe in atomic contexts (there's also acct_auto_close_mnt() to deal > with; that might take some work to get right, but I think it's not > fatal). I am slowly digging into this. With this patch I was was able to do an A/B comparison of what the stack cost on my unmounting my minimal ext4 filesystem from d_invalidate called with in a context with maximum symlink recursion depth, without and without a changed mntput. I used sysfs instead of nfs to mount my minimal ext4 filesystem on as I was a lazy bum and didn't have a nfs server setup handy. With just my detach_mounts branched merged into 3.15-rc0 I saw 4880 stack bytes left before calling detach_mounts from d_invalidate I saw 3904 stack bytes left after calling detach_mounts from d_invalidate Which means in practice unmounting my mininal ext4 filesystem image only used 976 additional bytes of stack. With the same kernel plus my change to mntput I saw 4880 stack bytes left before calling detach_mounts from d_invalidate I saw 4880 stack bytes left after calling detach_mounts from d_invalidate Which at least confirms that a change to mntput is enough to make deep stacks safe. With 3904 bytes of headroom from ext4 I may have to measure some of the nastier cases just to be certain there actually is a problem here. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html