On 2020/12/01 14:54, Anand Jain wrote: > On 1/12/20 10:29 am, Damien Le Moal wrote: >> On 2020/12/01 11:20, Anand Jain wrote: >>> On 30/11/20 9:15 pm, Damien Le Moal wrote: >>>> On 2020/11/30 21:13, Anand Jain wrote: >>>>> On 28/11/20 2:44 am, David Sterba wrote: >>>>>> On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote: >>>>>>> On 10/11/20 7:26 pm, Naohiro Aota wrote: >>>>>>>> This commit introduces the function btrfs_check_zoned_mode() to check if >>>>>>>> ZONED flag is enabled on the file system and if the file system consists of >>>>>>>> zoned devices with equal zone size. >>>>>>>> >>>>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> >>>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> >>>>>>>> Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx> >>>>>>>> Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> >>>>>>>> --- >>>>>>>> fs/btrfs/ctree.h | 11 ++++++ >>>>>>>> fs/btrfs/dev-replace.c | 7 ++++ >>>>>>>> fs/btrfs/disk-io.c | 11 ++++++ >>>>>>>> fs/btrfs/super.c | 1 + >>>>>>>> fs/btrfs/volumes.c | 5 +++ >>>>>>>> fs/btrfs/zoned.c | 81 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>> fs/btrfs/zoned.h | 26 ++++++++++++++ >>>>>>>> 7 files changed, 142 insertions(+) >>>>>>>> >>>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>>>>>>> index aac3d6f4e35b..453f41ca024e 100644 >>>>>>>> --- a/fs/btrfs/ctree.h >>>>>>>> +++ b/fs/btrfs/ctree.h >>>>>>>> @@ -948,6 +948,12 @@ struct btrfs_fs_info { >>>>>>>> /* Type of exclusive operation running */ >>>>>>>> unsigned long exclusive_operation; >>>>>>>> >>>>>>>> + /* Zone size when in ZONED mode */ >>>>>>>> + union { >>>>>>>> + u64 zone_size; >>>>>>>> + u64 zoned; >>>>>>>> + }; >>>>>>>> + >>>>>>>> #ifdef CONFIG_BTRFS_FS_REF_VERIFY >>>>>>>> spinlock_t ref_verify_lock; >>>>>>>> struct rb_root block_tree; >>>>>>>> @@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info) >>>>>>>> } >>>>>>>> #endif >>>>>>>> >>>>>>>> +static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info) >>>>>>>> +{ >>>>>>>> + return fs_info->zoned != 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> #endif >>>>>>>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c >>>>>>>> index 6f6d77224c2b..db87f1aa604b 100644 >>>>>>>> --- a/fs/btrfs/dev-replace.c >>>>>>>> +++ b/fs/btrfs/dev-replace.c >>>>>>>> @@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, >>>>>>>> return PTR_ERR(bdev); >>>>>>>> } >>>>>>>> >>>>>>>> + if (!btrfs_check_device_zone_type(fs_info, bdev)) { >>>>>>>> + btrfs_err(fs_info, >>>>>>>> + "dev-replace: zoned type of target device mismatch with filesystem"); >>>>>>>> + ret = -EINVAL; >>>>>>>> + goto error; >>>>>>>> + } >>>>>>>> + >>>>>>>> sync_blockdev(bdev); >>>>>>>> >>>>>>>> list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) { >>>>>>> >>>>>>> I am not sure if it is done in some other patch. But we still have to >>>>>>> check for >>>>>>> >>>>>>> (model == BLK_ZONED_HA && incompat_zoned)) >>>>>> >>>>>> Do you really mean BLK_ZONED_HA, ie. host-aware (HA)? >>>>>> btrfs_check_device_zone_type checks for _HM. >>>>> >>>>> >>>>> Still confusing to me. The below function, which is part of this >>>>> patch, says we don't support BLK_ZONED_HM. So does it mean we >>>>> allow BLK_ZONED_HA only? >>>>> >>>>> +static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info >>>>> *fs_info, >>>>> + struct block_device *bdev) >>>>> +{ >>>>> + u64 zone_size; >>>>> + >>>>> + if (btrfs_is_zoned(fs_info)) { >>>>> + zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT; >>>>> + /* Do not allow non-zoned device */ >>>> >>>> This comment does not make sense. It should be: >>>> >>>> /* Only allow zoned devices with the same zone size */ >>>> >>>>> + return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; >>>>> + } >>>>> + >>>>> + /* Do not allow Host Manged zoned device */ >>>>> + return bdev_zoned_model(bdev) != BLK_ZONED_HM; >>>> >>>> The comment is also wrong. It should read: >>>> >>>> /* Allow only host managed zoned devices */ >>>> >>>> This is because we decided to treat host aware devices in the same way as >>>> regular block devices, since HA drives are backward compatible with regular >>>> block devices. >>> >>> >>> Yeah, I read about them, but I have questions like do an FS work on top >>> of a BLK_ZONED_HA without modification? >> >> Yes. These drives are fully backward compatible and accept random writes >> anywhere. Performance however is potentially a different story as the drive will >> eventually need to do internal garbage collection of some sort, exactly like an >> SSD, but definitely not at SSD speeds :) >> >>> Are we ok to replace an HM device with a HA device? Or add a HA device >>> to a btrfs on an HM device. >> >> We have a choice here: we can treat HA drives as regular devices or treat them >> as HM devices. Anything in between does not make sense. I am fine either way, >> the main reason being that there are no HA drive on the market today that I know >> of (this model did not have a lot of success due to the potentially very >> unpredictable performance depending on the use case). >> >> So the simplest thing to do is, in my opinion, to ignore their "zone" >> characteristics and treat them as regular disks. But treating them as HM drives >> is a simple to do too. >>> Of note is that a host-aware drive will be reported by the block layer as >> BLK_ZONED_HA only as long as the drive does not have any partition. If it does, >> then the block layer will treat the drive as a regular disk. > > IMO. For now, it is better to check for the BLK_ZONED_HA explicitly in a > non-zoned-btrfs. And check for BLK_ZONED_HM explicitly in a zoned-btrfs. Sure, we can. But since HA drives are backward compatible, not sure the HA check for non-zoned make sense. As long as the zoned flag is not set, the drive can be used like a regular disk. If the user really want to use it as a zoned drive, then it can format with force selecting the zoned flag in btrfs super. Then the HA drive will be used as a zoned disk, exactly like HM disks. > This way, if there is another type of BLK_ZONED_xx in the future, we > have the opportunity to review to support it. As below [1]... It is very unlikely that we will see any other zone model. ZNS adopted the HM model in purpose, to avoid multiplying the possible models, making the ecosystem effort a nightmare. > > [1] > bool btrfs_check_device_type() > { > if (bdev_is_zoned()) { > if (btrfs_is_zoned()) > if (bdev_zoned_model == BLK_ZONED_HM) > /* also check the zone_size. */ > return true; > else > if (bdev_zoned_model == BLK_ZONED_HA) > /* a regular device and FS, no zone_size to check I think? */ > return true; > } else { > if (!btrfs_is_zoned()) > return true > } > > return false; > } > > Thanks. Works for me. May be reverse the conditions to make things easier to read and understand: bool btrfs_check_device_type() { if (btrfs_is_zoned()) { if (bdev_is_zoned()) { /* also check the zone_size. */ return true; } /* * Regular device: emulate zones with zone size equal * to device extent size. */ return true; } if (bdev_zoned_model == BLK_ZONED_HM) { /* Zoned HM device require zoned btrfs */ return false; } /* Regular device or zoned HA device used as a regular device */ return true; } -- Damien Le Moal Western Digital Research