Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Wed, Apr 09, 2014 at 06:53:23PM +0100, Al Viro wrote: > >> For starters, put that ext4 on top of dm-raid or dm-multipath. That alone >> will very likely push you over the top. >> >> Keep in mind, BTW, that you do not have full 8K to play with - there's >> struct thread_info that should not be stepped upon. Not particulary large >> (IIRC, restart_block is the largest piece in amd64 one), but it eats about >> 100 bytes. >> >> I'd probably use renameat(2) in testing - i.e. trigger the shite when >> resolving a deeply nested symlink in renameat() arguments. That brings >> extra struct nameidata into the game, i.e. extra 152 bytes chewed off the >> stack. > > Come to think of that, some extra nastiness could be had by mixing it with > execve(). You can have up to 4 levels of #! resolution there, each eating > up at least 128 bytes (more, actually). Compiler _might_ turn that > tail call of search_binary_handler() into a jump, but it's not guaranteed > at all. > > FWIW, it probably makes sense to turn load_script() into > static int load_script(struct linux_binprm *bprm) > { > int err = __load_script(bprm); > if (err) > return err; > return search_binary_handler(bprm); > } > > regardless of that issue; we don't need interp[] after the call of > open_exec(), so it makes sense to reduce the footprint in mutual > recursion loop. > > For extra pain, consider s/ext4/xfs/, possibly with iscsi thrown under the > bus^Wdm-multipath. > > The thing is, we are already too close to stack overflow limit. Adding > several kilobytes more is not survivable, and since you are taking > somebody in a userns DoSing the system into consideration, you can't > say "it takes malicious root to set up, so it's not serious" - the > DoS you mentioned requires the same thing... Thank you for the comments this makes it clear that the problem is with mntput (and the filesystem I/O that can be triggered) not particularly with detach_mounts. As I read the code all of these nasty cases we are concern with today can be triggered with pathput/mntput already with an appropriate race against umount, which means my detach_mounts code doesn't introduce a stack space usage regression, but seems to be the messenger that we have such problems in the VFS. There is a also a big difference between what can be triggered using filesystems we allow unprivileged users to mount, devpts, proc, ramfs, sysfs, mqueue, shmem (none of which have backing store) and filesystems with a long I/O path. So it looks like it still requires global root to trigger this, although I still think it is serious. One of the more interesting aspects of this userns work is running into code that semantically should be safe for unprivileged users to use but as we haven't historically allowed unprivileged users to use the code there are silly assumptions and untested code paths. > BTW, another thing to test would be this: > mount nfs on /mnt > mount a filesystem on /mnt/path that can be invalidated > cd to /mnt/path/foo > bind /mnt on /mnt/path/foo/bar > shoot /mnt/path (on server) > stat bar/path/foo > That should rip the fs you are in out of the tree; it should work, but > it's definitely a case worth testing. Agreed that is a case worth test. I wasn't looking at that case in particular as that is not the worst case stack usage or even an approximation of it. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html