Re: [PATCH v10 05/41] btrfs: check and enable ZONED mode

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Tue, 1 Dec 2020 02:29:37 +0000

On 2020/12/01 11:20, Anand Jain wrote:
> On 30/11/20 9:15 pm, Damien Le Moal wrote:
>> On 2020/11/30 21:13, Anand Jain wrote:
>>> On 28/11/20 2:44 am, David Sterba wrote:
>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote:
>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if
>>>>>> ZONED flag is enabled on the file system and if the file system consists of
>>>>>> zoned devices with equal zone size.
>>>>>>
>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>
>>>>>> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>>>>>> ---
>>>>>>     fs/btrfs/ctree.h       | 11 ++++++
>>>>>>     fs/btrfs/dev-replace.c |  7 ++++
>>>>>>     fs/btrfs/disk-io.c     | 11 ++++++
>>>>>>     fs/btrfs/super.c       |  1 +
>>>>>>     fs/btrfs/volumes.c     |  5 +++
>>>>>>     fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>     fs/btrfs/zoned.h       | 26 ++++++++++++++
>>>>>>     7 files changed, 142 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>> index aac3d6f4e35b..453f41ca024e 100644
>>>>>> --- a/fs/btrfs/ctree.h
>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info {
>>>>>>     	/* Type of exclusive operation running */
>>>>>>     	unsigned long exclusive_operation;
>>>>>>     
>>>>>> +	/* Zone size when in ZONED mode */
>>>>>> +	union {
>>>>>> +		u64 zone_size;
>>>>>> +		u64 zoned;
>>>>>> +	};
>>>>>> +
>>>>>>     #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>>>>>     	spinlock_t ref_verify_lock;
>>>>>>     	struct rb_root block_tree;
>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
>>>>>>     }
>>>>>>     #endif
>>>>>>     
>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
>>>>>> +{
>>>>>> +	return fs_info->zoned != 0;
>>>>>> +}
>>>>>> +
>>>>>>     #endif
>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>>>>>> index 6f6d77224c2b..db87f1aa604b 100644
>>>>>> --- a/fs/btrfs/dev-replace.c
>>>>>> +++ b/fs/btrfs/dev-replace.c
>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>>>>>>     		return PTR_ERR(bdev);
>>>>>>     	}
>>>>>>     
>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>> +		btrfs_err(fs_info,
>>>>>> +			  "dev-replace: zoned type of target device mismatch with filesystem");
>>>>>> +		ret = -EINVAL;
>>>>>> +		goto error;
>>>>>> +	}
>>>>>> +
>>>>>>     	sync_blockdev(bdev);
>>>>>>     
>>>>>>     	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
>>>>>
>>>>>     I am not sure if it is done in some other patch. But we still have to
>>>>>     check for
>>>>>
>>>>>     (model == BLK_ZONED_HA && incompat_zoned))
>>>>
>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
>>>> btrfs_check_device_zone_type checks for _HM.
>>>
>>>
>>> Still confusing to me. The below function, which is part of this
>>> patch, says we don't support BLK_ZONED_HM. So does it mean we
>>> allow BLK_ZONED_HA only?
>>>
>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
>>> *fs_info,
>>> +						struct block_device *bdev)
>>> +{
>>> +	u64 zone_size;
>>> +
>>> +	if (btrfs_is_zoned(fs_info)) {
>>> +		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
>>> +		/* Do not allow non-zoned device */
>>
>> This comment does not make sense. It should be:
>>
>> 		/* Only allow zoned devices with the same zone size */
>>
>>> +		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
>>> +	}
>>> +
>>> +	/* Do not allow Host Manged zoned device */
>>> +	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
>>
>> The comment is also wrong. It should read:
>>
>> 	/* Allow only host managed zoned devices */
>>
>> This is because we decided to treat host aware devices in the same way as
>> regular block devices, since HA drives are backward compatible with regular
>> block devices.
> 
> 
> Yeah, I read about them, but I have questions like do an FS work on top 
> of a BLK_ZONED_HA without modification?

Yes. These drives are fully backward compatible and accept random writes
anywhere. Performance however is potentially a different story as the drive will
eventually need to do internal garbage collection of some sort, exactly like an
SSD, but definitely not at SSD speeds :)

>   Are we ok to replace an HM device with a HA device? Or add a HA device 
> to a btrfs on an HM device.

We have a choice here: we can treat HA drives as regular devices or treat them
as HM devices. Anything in between does not make sense. I am fine either way,
the main reason being that there are no HA drive on the market today that I know
of (this model did not have a lot of success due to the potentially very
unpredictable performance depending on the use case).

So the simplest thing to do is, in my opinion, to ignore their "zone"
characteristics and treat them as regular disks. But treating them as HM drives
is a simple to do too.

Of note is that a host-aware drive will be reported by the block layer as
BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
then the block layer will treat the drive as a regular disk.

> 
> Thanks.
> 
>>
>>> +}
>>>
>>>
>>> Also, if there is a new type of zoned device in the future, the older
>>> kernel should be able to reject the newer zone device types.
>>>
>>> And, if possible could you rename above function to
>>> btrfs_zone_type_is_valid(). Or better.
>>>
>>>
>>>>> right? What if in a non-zoned FS, a zoned device is added through the
>>>>> replace. No?
>>>>
>>>> The types of devices cannot mix, yeah. So I'd like to know the answer as
>>>> well.
>>>
>>>
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>>>>     	if (IS_ERR(bdev))
>>>>>>     		return PTR_ERR(bdev);
>>>>>>     
>>>>>> +	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
>>>>>> +		ret = -EINVAL;
>>>>>> +		goto error;
>>>>>> +	}
>>>>>> +
>>>>>>     	if (fs_devices->seeding) {
>>>>>>     		seeding_dev = 1;
>>>>>>     		down_write(&sb->s_umount);
>>>>>
>>>>> Same here too. It can also happen that a zone device is added to a non
>>>>> zoned fs.
>>>
>>>
>>> Thanks.
>>>
>>
>>
> 
> 

-- 
Damien Le Moal
Western Digital Research