Re: [PATCH v2 2/4] btrfs: mark device addition as mnt_want_write_file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 16, 2022 at 04:06:26PM +0000, Filipe Manana wrote:
> On Wed, Mar 16, 2022 at 10:22:38PM +0900, Naohiro Aota wrote:
> > btrfs_init_new_device() calls btrfs_relocate_sys_chunk() which incurs
> > file-system internal writing. That writing can cause a deadlock with
> > FS freezing like as described in like as described in commit
> > 26559780b953 ("btrfs: zoned: mark relocation as writing").
> > 
> > Mark the device addition as mnt_want_write_file. This is also consistent
> > with the removing device ioctl counterpart.
> > 
> > Cc: stable@xxxxxxxxxxxxxxx # 4.9+
> > Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>
> > ---
> >  fs/btrfs/ioctl.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 60c907b14547..a6982a1fde65 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3474,8 +3474,10 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
> >  	return ret;
> >  }
> >  
> > -static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > +static long btrfs_ioctl_add_dev(struct file *file, void __user *arg)
> >  {
> > +	struct inode *inode = file_inode(file);
> > +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> >  	struct btrfs_ioctl_vol_args *vol_args;
> >  	bool restore_op = false;
> >  	int ret;
> > @@ -3488,6 +3490,10 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> >  		return -EINVAL;
> >  	}
> >  
> > +	ret = mnt_want_write_file(file);
> > +	if (ret)
> > +		return ret;
> 
> So, this now breaks all test cases that exercise device seeding, and I clearly
> forgot about seeding when I asked about why not use mnt_want_write_file()
> instead of a bare call to sb_start_write():

Ah, yes, I also confirmed they fail.

> 
> $ ./check btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
><snip>
> Ran: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> Failures: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> Failed 5 of 5 tests
> 
> So device seeding introduces a special case. If we mount a seeding
> filesystem, it's RO, so the mnt_want_write_file() fails.

Yeah, so we are in a mixed state here. It's RO with a seeding
device. Or, it must be RW otherwise (checked in
btrfs_init_new_device()).

> Something like this deals with it and it makes the tests pass:
> 
><snip>
> 
> We are also changing the semantics as we no longer allow for adding a device
> to a RO filesystem. So the lack of a mnt_want_write_file() was intentional
> to deal with the seeding filesystem case. But calling mnt_want_write_file()
> if we are not seeding, changes the semantics - I'm not sure if anyone relies
> on the ability to add a device to a fs mounted RO, I'm not seeing if it's an
> useful use case.

Adding a device to RO FS anyway returns -EROFS from
btrfs_init_new_device(). So, there is no change.

> So either we do that special casing like in that diff, or we always do the
> sb_start_write() / sb_end_write() - in any case please add a comment explaining
> why we do it like that, why we can't use mnt_want_write_file().

The conditional using of sb_start_write() or mnt_want_write_file()
seems a bit dirty. And, I just thought, marking the FS "writing" when
it's read-only also seems odd.

I'm now thinking we should have sb_start_write() around here where the
FS is surely RW.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 393fc7db99d3..50e02dc4e2b2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2731,6 +2731,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	mutex_unlock(&fs_devices->device_list_mutex);
 
+	sb_start_write(fs_info->sb);
+
 	if (seeding_dev) {
 		mutex_lock(&fs_info->chunk_mutex);
 		ret = init_first_rw_device(trans);
@@ -2786,6 +2788,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		ret = btrfs_commit_transaction(trans);
 	}
 
+	sb_end_write(fs_info->sb);
+
 	/*
 	 * Now that we have written a new super block to this device, check all
 	 * other fs_devices list if device_path alienates any other scanned
@@ -2801,6 +2805,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 
 error_sysfs:
+	sb_end_write(fs_info->sb);
+
 	btrfs_sysfs_remove_device(device);
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	mutex_lock(&fs_info->chunk_mutex);

> Thanks.
> 
> 
> > +
> >  	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_DEV_ADD)) {
> >  		if (!btrfs_exclop_start_try_lock(fs_info, BTRFS_EXCLOP_DEV_ADD))
> >  			return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
> > @@ -3520,6 +3526,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> >  		btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> >  	else
> >  		btrfs_exclop_finish(fs_info);
> > +	mnt_drop_write_file(file);
> >  	return ret;
> >  }
> >  
> > @@ -5443,7 +5450,7 @@ long btrfs_ioctl(struct file *file, unsigned int
> >  	case BTRFS_IOC_RESIZE:
> >  		return btrfs_ioctl_resize(file, argp);
> >  	case BTRFS_IOC_ADD_DEV:
> > -		return btrfs_ioctl_add_dev(fs_info, argp);
> > +		return btrfs_ioctl_add_dev(file, argp);
> >  	case BTRFS_IOC_RM_DEV:
> >  		return btrfs_ioctl_rm_dev(file, argp);
> >  	case BTRFS_IOC_RM_DEV_V2:
> > -- 
> > 2.35.1
> > 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux