Re: [GIT PULL] Detaching mounts on unlink for 3.15-rc1

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Wed, 09 Apr 2014 15:49:09 -0700

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Wed, Apr 09, 2014 at 06:53:23PM +0100, Al Viro wrote:
>
>> For starters, put that ext4 on top of dm-raid or dm-multipath.  That alone
>> will very likely push you over the top.
>> 
>> Keep in mind, BTW, that you do not have full 8K to play with - there's
>> struct thread_info that should not be stepped upon.  Not particulary large
>> (IIRC, restart_block is the largest piece in amd64 one), but it eats about
>> 100 bytes.
>> 
>> I'd probably use renameat(2) in testing - i.e. trigger the shite when
>> resolving a deeply nested symlink in renameat() arguments.  That brings
>> extra struct nameidata into the game, i.e. extra 152 bytes chewed off the
>> stack.
>
> Come to think of that, some extra nastiness could be had by mixing it with
> execve().  You can have up to 4 levels of #! resolution there, each eating
> up at least 128 bytes (more, actually).  Compiler _might_ turn that
> tail call of search_binary_handler() into a jump, but it's not guaranteed
> at all.
>
> FWIW, it probably makes sense to turn load_script() into
> static int load_script(struct linux_binprm *bprm)
> {
> 	int err = __load_script(bprm);
> 	if (err)
> 		return err;
> 	return search_binary_handler(bprm);
> }
>
> regardless of that issue; we don't need interp[] after the call of
> open_exec(), so it makes sense to reduce the footprint in mutual
> recursion loop.
>
> For extra pain, consider s/ext4/xfs/, possibly with iscsi thrown under the
> bus^Wdm-multipath.
>
> The thing is, we are already too close to stack overflow limit.  Adding
> several kilobytes more is not survivable, and since you are taking
> somebody in a userns DoSing the system into consideration, you can't
> say "it takes malicious root to set up, so it's not serious" - the
> DoS you mentioned requires the same thing...

Thank you for the comments this makes it clear that the problem is with
mntput (and the filesystem I/O that can be triggered) not particularly
with detach_mounts.  As I read the code all of these nasty cases we are
concern with today can be triggered with pathput/mntput already with an
appropriate race against umount, which means my detach_mounts code
doesn't introduce a stack space usage regression, but seems to be the
messenger that we have such problems in the VFS.

There is a also a big difference between what can be triggered using
filesystems we allow unprivileged users to mount, devpts, proc, ramfs,
sysfs, mqueue, shmem (none of which have backing store) and filesystems
with a long I/O path. 

So it looks like it still requires global root to trigger this, although
I still think it is serious. 

One of the more interesting aspects of this userns work is running into
code that semantically should be safe for unprivileged users to use but
as we haven't historically allowed unprivileged users to use the code
there are silly assumptions and untested code paths.

> BTW, another thing to test would be this:
> 	mount nfs on /mnt
> 	mount a filesystem on /mnt/path that can be invalidated
> 	cd to /mnt/path/foo
> 	bind /mnt on /mnt/path/foo/bar
> 	shoot /mnt/path (on server)
> 	stat bar/path/foo
> That should rip the fs you are in out of the tree; it should work, but
> it's definitely a case worth testing.

Agreed that is a case worth test.  I wasn't looking at that case in
particular as that is not the worst case stack usage or even an
approximation of it.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html