Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Omar Sandoval <osandov@xxxxxxxxxxx> writes:

> On Mon, Mar 30, 2015 at 02:30:34PM +0200, David Sterba wrote:
>> On Mon, Mar 30, 2015 at 02:02:17AM -0700, Omar Sandoval wrote:
>> > Before commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
>> > d_invalidate() could return -EBUSY when a dentry for a directory had
>> > more than one reference to it. This is what prevented a mounted
>> > subvolume from being deleted, as struct vfsmount holds a reference to
>> > the subvolume dentry. However, that commit removed that case, and later
>> > commits in that patch series removed the return code from d_invalidate()
>> > completely, so we don't get that check for free anymore. So, reintroduce
>> > it in btrfs_ioctl_snap_destroy().
>> 
>> > This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
>> > correct fix for this bug, but it's equivalent to the pre-3.18 behavior and it's
>> > the best that I could come up with. Thoughts?
>> 
>> > +	spin_lock(&dentry->d_lock);
>> > +	err = dentry->d_lockref.count > 1 ? -EBUSY : 0;
>> > +	spin_unlock(&dentry->d_lock);
>> 
>> The fix restores the original behaviour, but I don't think opencoding and
>> using internals is fine. Either there should be a vfs api for that or
>> there's an existing one that can be used instead.

I have a problem with restoring the original behavior as is.

In some sense it re-introduces the security issue that the d_invalidate
changes were built to fix.

Any user in the system can create a user namespace, create a mount
namespace and keep any subvolume pinned forever.  Which at the very
least could make a very nice DOS attack.  I am not familiar enough with
how people use subvolumes and 

So let me ask.  How can userspace not know that a subvolume that they
want to delete is already mounted?

I can see having something like is_local_mount_root and denying the
subvolume destruction if the mount that is pinning it is in your local
mount namespace.  


>> The bug here seems defined up to the point that we're trying to delete a
>> subvolume that's a mountpoint. My next guess is that a check
>> 
>> 	if (d_mountpoint(&dentry)) { ... }
>> 
>> could work.
>
> That was my first instinct as well, but d_mountpoint() is true for
> dentries that have a filesystem mounted on them (e.g., after mount
> /dev/sda1 /mnt, the dentry for "/mnt"), not the dentry that is mounted.
>
> I poked around the mount code for awhile and couldn't come up with
> anything using the existing interface. Mounting subvolumes bubbles down
> to mount_subtree(), which doesn't really leave any traces of which
> subvolume is mounted except for the dentry in struct vfsmount.
>
> (As far as I can tell, under the covers subvolume deletion is more or
> less equivalent to an rm -rf, and we obviously don't do anything to stop
> users from doing that on the root of their mounted filesystem, but it
> appears that users expect the original behavior.)
>
> Here's an idea: mark mount root dentries as such in the VFS and check it
> in the Btrfs code. Adding fsdevel ML for comments
> (https://lkml.org/lkml/2015/3/30/125 is the original message).

Marking root dentries is needed to fix the bug that you can escape
the limitations of loopback mounts with a carefully placed rename.

I have a patch cooking that marks mountpoints and tracks all of the
mounts on a dentry.  So except for the possibility of stepping on each
others toes I have no objections.

Eric

> ----
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 74609b9..8a0933d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2384,6 +2384,11 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
>  		goto out_dput;
>  	}
>  
> +	if (d_is_mount_root(dentry)) {
> +		err = -EBUSY;
> +		goto out_dput;
> +	}
> +
>  	mutex_lock(&inode->i_mutex);
>  
>  	/*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 82ef140..a28ca15 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -920,6 +920,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
>  		return ERR_CAST(root);
>  	}
>  
> +	spin_lock(&root->d_lock);
> +	root->d_flags |= DCACHE_MOUNT_ROOT;
> +	spin_unlock(&root->d_lock);
> +
>  	mnt->mnt.mnt_root = root;
>  	mnt->mnt.mnt_sb = root->d_sb;
>  	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
> @@ -1017,6 +1021,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
>  
>  static void cleanup_mnt(struct mount *mnt)
>  {
> +	struct dentry *root = mnt->mnt.mnt_root;
> +
>  	/*
>  	 * This probably indicates that somebody messed
>  	 * up a mnt_want/drop_write() pair.  If this
> @@ -1031,7 +1037,10 @@ static void cleanup_mnt(struct mount *mnt)
>  	if (unlikely(mnt->mnt_pins.first))
>  		mnt_pin_kill(mnt);
>  	fsnotify_vfsmount_delete(&mnt->mnt);
> -	dput(mnt->mnt.mnt_root);
> +	spin_lock(&root->d_lock);
> +	root->d_flags &= ~DCACHE_MOUNT_ROOT;
> +	spin_unlock(&root->d_lock);
> +	dput(root);
>  	deactivate_super(mnt->mnt.mnt_sb);
>  	mnt_free_id(mnt);
>  	call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index d835879..d974ab8 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -225,6 +225,7 @@ struct dentry_operations {
>  
>  #define DCACHE_MAY_FREE			0x00800000
>  #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
> +#define DCACHE_MOUNT_ROOT		0x02000000 /* is the root of a mount */
>  
>  extern seqlock_t rename_lock;
>  
> @@ -401,6 +402,16 @@ static inline bool d_mountpoint(const struct dentry *dentry)
>  	return dentry->d_flags & DCACHE_MOUNTED;
>  }
>  
> +static inline bool d_is_mount_root(const struct dentry *dentry)
> +{
> +	bool ret;
> +
> +	spin_lock(&dentry->d_lock);
> +	ret = dentry->d_flags & DCACHE_MOUNT_ROOT;
> +	spin_unlock(&dentry->d_lock);
> +	return ret;
> +}
> +
>  /*
>   * Directory cache entry type accessor functions.
>   */
> ----
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux