On Sat, May 30, 2015 at 9:59 AM, Omar Sandoval <osandov@xxxxxxxxxxx> wrote: > Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"), > mounted subvolumes can be deleted because d_invalidate() won't fail. > However, we run into problems when we attempt to delete the default > subvolume while it is mounted as the root filesystem: > > # btrfs subvol list / > ID 257 gen 306 top level 5 path rootvol > ID 267 gen 334 top level 5 path snap1 > # btrfs subvol get-default / > ID 267 gen 334 top level 5 path snap1 > # btrfs inspect-internal rootid / > 267 > # mount -o subvol=/ /dev/vda1 /mnt > # btrfs subvol del /mnt/snap1 > Delete subvolume (no-commit): '/mnt/snap1' > ERROR: cannot delete '/mnt/snap1' - Operation not permitted > # findmnt / > findmnt: can't read /proc/mounts: No such file or directory > # ls /proc > # > > Markus reported that this same scenario simply led to a kernel oops. > > This happens because in btrfs_ioctl_snap_destroy(), we call > d_invalidate() before we check may_destroy_subvol(), which means that we > detach the submounts and drop the dentry before erroring out. Instead, > we should only invalidate the dentry once we know that we're going > through with the deletion. > > Cc: <stable@xxxxxxxxxxxxxxx> > Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") > Reported-by: Markus Schauler <mschauler@xxxxxxxxx> > Signed-off-by: Omar Sandoval <osandov@xxxxxxxxxxx> > --- > The other fix for preventing all mounted subvolumes from being deleted > would preclude this, but it sounded like we were leaning towards > enforcing that in userspace once subvolume info becomes available in > /proc/mounts, so this should be fixed separately. > > fs/btrfs/ioctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 1c22c6518504..8edb8544088b 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2413,14 +2413,14 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, > goto out_unlock_inode; > } > > - d_invalidate(dentry); > - > down_write(&root->fs_info->subvol_sem); > > err = may_destroy_subvol(dest); > if (err) > goto out_up_write; > > + d_invalidate(dentry); > + Any reason why not calling d_invalidate() only if the call btrfs_unlink_subvol() succeeds? Not seeing a reason why we should invalidate before doing the actual deletion successfully (before that metadata reservation can fail or failure to start/join a transaction, etc). Also, would you consider making an xfstest for this? thanks > btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP); > /* > * One for dir inode, two for dir entries, two for root > -- > 2.4.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html