Gao feng <gaofeng@xxxxxxxxxxxxxx> writes: > On 11/15/2013 12:54 AM, Andy Lutomirski wrote: >> On Thu, Nov 14, 2013 at 3:10 AM, Gao feng <gaofeng@xxxxxxxxxxxxxx> wrote: >>> On 11/13/2013 03:26 PM, Gao feng wrote: >>>> On 11/09/2013 01:42 PM, Eric W. Biederman wrote: >>>>> Right now I would rather not have the empty directory exception than >>>>> remove this code. >>>>> >>>>> The test is a little trickier to write than it might otherwise be >>>>> because /proc and /sys tend to be slightly imperfect filesystems. >>>>> >>>>> I think the only way to really test that is to call readdir on the >>>>> directory itself :( I don't like that thought. >>>>> >>>>> I don't know what I was thinking when I wrote that test but I definitely >>>>> goofed up. Grr! >>>>> >>>>> I can certainly filter out any directory with nlink > 2. That would be >>>>> an easy partial step forward. >>>>> >>>>> The real question though is how do I detect directories it is safe to >>>>> mount on where there will not be files in them. I can't call iterate >>>>> with the namespace_lock held so things are a bit tricky. >>>>> >>>> >>>> I know this problem is not easy to be resolved. why not let the user >>>> make the decision? maybe we can introduce a new mount option MS_LOCK, >>>> if user wants to use mount to hide something, he should use mount with >>>> option MS_LOCK. so the unpriviged user can't umount this filesystem and >>>> fail to mount the filesystem if one of it's child mount is mounted with >>>> MS_LOCK option otherwise he use MS_REC too. >>>> >>> >>> Something like this. >>> >>> From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001 >>> From: Gao feng <gaofeng@xxxxxxxxxxxxxx> >>> Date: Wed, 13 Nov 2013 16:06:46 +0800 >>> Subject: [PATCH] userns: introduce new mount option MS_LOCK >>> >>> After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942 >>> vfs: Lock in place mounts from more privileged users, >>> in userns, the mounts of child mntns which copied from >>> parent mntns is locked and user has no rights to umount/move >>> them, it's too strict. >>> >>> The core purpose of above commit is trying to prevent >>> unprivileged user from accessing files hidden by mount. >>> This patch introduces a new mount option MS_LOCK, this >>> gives user the capable to mount filesystem as the type >>> of lock if he wants to use mount to hide something. >>> >> >> This is bad -- if something was secure in old kernels, it needs to >> stay secure. If you had MS_NOT_A_LOCK, that would be okay, but it >> might not solve your problem. >> > > what you mean old kernels here? I saw patch "vfs: Lock in place mounts from more privileged users" > is merged into upstream in linux 3.12-rc1, this is not very old. I think there > are not many userspace processes rely on this feature. Sort of true. Most people aren't that silly. This feature was added to defend against a theoretical attack that you can use with mount namespaces. In particular the scenario we are concerned with is: Suppose the file system looks like: Suppose there are two filesystems a and b that look like: a:/usr/ a:/usr/my_very_secret_file a:/dev/ a:/etc/ a:/lib/ b:/bin/ b:/etc/ b:/games/ b:/include/ b:/lib/ b:/lib32/ b:/local/ b:/sbin/ b:/share/ b:/src/ And filesystem b is mounted on a:/usr hiding a:/usr/my_very_secret_file So the filesystem looks like: /usr/ /usr/bin/ /usr/etc/ /usr/games/ /usr/include/ /usr/lib/ /usr/lib32/ /usr/local/ /usr/sbin/ /usr/share/ /usr/src/ /dev/ /etc/ /lib/ Without locking mounts into place an unprivileged user can clone the mount namespace and do "umount /usr" and read /usr/my_very_secret_file. Most systems don't hide sensitive things with mounts but it is very possible and guarding against is fairly cheap and easy. And while a little annoying it should not be a large impediment to unprivileged user of the user namespace because pivot root still works. This thread started talking about bugs in fs_fully_visible. And those bugs are fixable and I aim to get to them shortly. At the very least I can lie and test for nlink <= 2 which fixes the regression in mounting proc. Then I can write the fun version that takes references and drops locks so it can call the internal version of readdir to see if a directory is actually empty. But the principle remains the same we really don't want to reveal anything that is hidden under a mount on purpose or by mistake. Just because then we don't have to think about those things from a security point of view making everyone's life easier. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html