Re: [PATCH/RFC] VFS: LOOKUP_MOUNTPOINT should used cached info whenever possible.

Jeff Layton <jlayton@xxxxxxxxxx> · Mon, 17 Apr 2023 11:21:28 -0400

On Mon, 2023-04-17 at 16:24 +0200, Christian Brauner wrote:
> On Mon, Apr 17, 2023 at 08:25:23AM -0400, Jeff Layton wrote:
> > On Mon, 2023-04-17 at 13:55 +0200, Christian Brauner wrote:
> > > On Mon, Apr 17, 2023 at 09:13:52AM +1000, NeilBrown wrote:
> > > > 
> > > > When performing a LOOKUP_MOUNTPOINT lookup we don't really want to
> > > > engage with underlying systems at all.  Any mount point MUST be in the
> > > > dcache with a complete direct path from the root to the mountpoint.
> > > > That should be sufficient to find the mountpoint given a path name.
> > > > 
> > > > This becomes an issue when the filesystem changes unexpected, such as
> > > > when a NFS server is changed to reject all access.  It then becomes
> > > > impossible to unmount anything mounted on the filesystem which has
> > > > changed.  We could simply lazy-unmount the changed filesystem and that
> > > > will often be sufficient.  However if the target filesystem needs
> > > > "umount -f" to complete the unmount properly, then the lazy unmount will
> > > > leave it incompletely unmounted.  When "-f" is needed, we really need to
> > > 
> > > I don't understand this yet. All I see is nfs_umount_begin() that's
> > > different for MNT_FORCE to kill remaining io. Why does that preclude
> > > MNT_DETACH? You might very well fail MNT_FORCE and the only way you can
> > > get rid is to use MNT_DETACH, no? So I don't see why that is an
> > > argument.
> > > 
> > > > be able to name the target filesystem.
> > > > 
> > > > We NEVER want to revalidate anything.  We already avoid the revalidation
> > > > of the mountpoint itself, be we won't need to revalidate anything on the
> > > > path either as thay might affect the cache, and the cache is what we are
> > > > really looking in.
> > > > 
> > > > Permission checks are a little less clear.  We currently allow any user
> > > 
> > > This is all very brittle.
> > > 
> > > First case that comes to mind is overlayfs where the permission checking
> > > is performed twice. Once on the overlayfs inode itself based on the
> > > caller's security context and a second time on the underlying inode with
> > > the security context of the mounter of the overlayfs instance.
> > > 
> > > A mounter could have dropped all privileges aside from CAP_SYS_ADMIN so
> > > they'd be able to mount the overlayfs instance but would be restricted
> > > from accessing certain directories or files. The task accessing the
> > > overlayfs instance however could have a completely different security
> > > context including CAP_DAC_READ_SEARCH and such. Both tasks could
> > > reasonably be in different user namespaces and so on.
> > > 
> > > The LSM hooks are also called twice and would now also be called once.
> > > 
> > > It also forgets that acl_permission() check may very well call into the
> > > filesystem via check_acl().
> > > 
> > > So umount could either be used to infer existence of files that the user
> > > wouldn't otherwise know they exist or in the worst case allow to umount
> > > something that they wouldn't have access to.
> > > 
> > > Aside from that this would break userspace assumptions and as Al and
> > > I've mentioned before in the other thread you'd need a new flag to
> > > umount2() for this. The permission model can't just change behind users
> > > back.
> > > 
> > > But I dislike it for the now even more special-cased umount path lookup
> > > alone tbh. I'd feel way more comfortable with a non-lookup related
> > > solution that doesn't have subtle implications for permission checking.
> > > 
> > 
> > These are good points.
> > 
> > One way around the issues you point out might be to pass down a new
> > MAY_LOOKUP_MOUNTPOINT mask flag to ->permission. That would allow the
> > filesystem driver to decide whether it wants to avoid potentially
> > problematic activity when checking permissions. nfs_permission could
> > check for that and take a more hands-off approach to the permissions
> > check. Between that and skipping d_revalidate on LOOKUP_MOUNTPOINT, I
> > think that might do what we need.
> 
> Yes, that's pretty obvious. I considered that, wrote the section and
> deleted it again because I still find this pretty ugly. It does leak
> very specific lookup information into filesystems that isn't generically
> useful like MAY_NOT_BLOCK is. Most filesystems don't need to check with
> a server like NFS does and so don't suffer from this issue.
> 

Sort of. Most of the MAY flags cover a specific operation. In this case,
we'd just be adding a flag to make it clear that this permission check
is for the specific purpose of chasing down a mountpoint (presumably to
umount).

This field does look less like a "mask" field now though, and more like
a "flags".

> The crucial change in the patchset in its current form is that you're
> requesting from the VFS to significantly alter permission checking just
> because this is a umount request in a pretty fundamental way for roughly
> 21 filesytems. Afaict, on the VFS level that doesn't make sense. The VFS
> can't just skip a filesystem's ->permission() handler with "well, it's
> just on a way to a umount so whatever". That's just not going to be
> correct and is just a source of subtle security bugs. So NAK on that.
>

Fair enough. I too think the permission check ought to be left up to the
filesystem driver. It does need some way to know that the permission
check is in the context of a LOOKUP_MOUNTPOINT pathwalk though. A MAY_*
flag seems like the obvious choice, since we could use that for checking
ACLs too, but maybe there is some better way.

> And I'm curious why is it obvious that we don't want to revalidate _any_
> path component and not just the last one? Why is that generally safe?
> Why can't this be used to access files and directories the caller
> wouldn't otherwise be able to access? I would like to have this spelled
> out for slow people like me, please.
> 
> From my point of view, this would only be somewhat safe _generally_ if
> you'd allow circumvention for revalidation and permission checking if
> MNT_FORCE is specified and the caller has capable(CAP_DAC_READ_SEARCH).
> You'd still mess with overlayfs permission model in this case though.
> 
> Plus, there are better options of solving this problem. Again, I'd
> rather build a separate api for unmounting then playing such potentially
> subtle security sensitive games with permission checking during path
> lookup.

umount(2) is really a special case because the whole intent is to detach
a mount from the local hierarchy and stop using it. The unfortunate bit
is that it is a path-based syscall.

So usually we have to:

- determine the path: Maybe stat() it and to validate that it's the
mountpoint we want to drop
- then call umount with that path

The last thing we want in that case is for the server to decide to
change some intermediate dentry in between the two operations. Best
case, you'll get back ENOENT or something when the pathwalk fails. Worst
case, the server swaps what are two different mountpoints on your client
and you unmount the wrong one.

If we don't revaliate, then we're no worse off, and may be better off if
something hinky happens to the server of an intermediate dentry in the
path.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>