Re: [PATCH v10 05/41] btrfs: check and enable ZONED mode

Anand Jain <anand.jain@xxxxxxxxxx> · Tue, 1 Dec 2020 13:53:22 +0800

On 1/12/20 10:29 am, Damien Le Moal wrote:
On 2020/12/01 11:20, Anand Jain wrote:
On 30/11/20 9:15 pm, Damien Le Moal wrote:
On 2020/11/30 21:13, Anand Jain wrote:
On 28/11/20 2:44 am, David Sterba wrote:
On Wed, Nov 18, 2020 at 07:29:20PM +0800, Anand Jain wrote:
On 10/11/20 7:26 pm, Naohiro Aota wrote:
This commit introduces the function btrfs_check_zoned_mode() to check if
ZONED flag is enabled on the file system and if the file system consists of
zoned devices with equal zone size.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>
Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
---
     fs/btrfs/ctree.h       | 11 ++++++
     fs/btrfs/dev-replace.c |  7 ++++
     fs/btrfs/disk-io.c     | 11 ++++++
     fs/btrfs/super.c       |  1 +
     fs/btrfs/volumes.c     |  5 +++
     fs/btrfs/zoned.c       | 81 ++++++++++++++++++++++++++++++++++++++++++
     fs/btrfs/zoned.h       | 26 ++++++++++++++
     7 files changed, 142 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac3d6f4e35b..453f41ca024e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -948,6 +948,12 @@ struct btrfs_fs_info {
     	/* Type of exclusive operation running */
     	unsigned long exclusive_operation;
     
+	/* Zone size when in ZONED mode */
+	union {
+		u64 zone_size;
+		u64 zoned;
+	};
+
     #ifdef CONFIG_BTRFS_FS_REF_VERIFY
     	spinlock_t ref_verify_lock;
     	struct rb_root block_tree;
@@ -3595,4 +3601,9 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info)
     }
     #endif
     
+static inline bool btrfs_is_zoned(struct btrfs_fs_info *fs_info)
+{
+	return fs_info->zoned != 0;
+}
+
     #endif
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 6f6d77224c2b..db87f1aa604b 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -238,6 +238,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
     		return PTR_ERR(bdev);
     	}
     
+	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
+		btrfs_err(fs_info,
+			  "dev-replace: zoned type of target device mismatch with filesystem");
+		ret = -EINVAL;
+		goto error;
+	}
+
     	sync_blockdev(bdev);
     
     	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {

     I am not sure if it is done in some other patch. But we still have to
     check for

     (model == BLK_ZONED_HA && incompat_zoned))

Do you really mean BLK_ZONED_HA, ie. host-aware (HA)?
btrfs_check_device_zone_type checks for _HM.


Still confusing to me. The below function, which is part of this
patch, says we don't support BLK_ZONED_HM. So does it mean we
allow BLK_ZONED_HA only?

+static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info
*fs_info,
+						struct block_device *bdev)
+{
+	u64 zone_size;
+
+	if (btrfs_is_zoned(fs_info)) {
+		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
+		/* Do not allow non-zoned device */

This comment does not make sense. It should be:

		/* Only allow zoned devices with the same zone size */

+		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
+	}
+
+	/* Do not allow Host Manged zoned device */
+	return bdev_zoned_model(bdev) != BLK_ZONED_HM;

The comment is also wrong. It should read:

	/* Allow only host managed zoned devices */

This is because we decided to treat host aware devices in the same way as
regular block devices, since HA drives are backward compatible with regular
block devices.


Yeah, I read about them, but I have questions like do an FS work on top
of a BLK_ZONED_HA without modification?

Yes. These drives are fully backward compatible and accept random writes
anywhere. Performance however is potentially a different story as the drive will
eventually need to do internal garbage collection of some sort, exactly like an
SSD, but definitely not at SSD speeds :)

   Are we ok to replace an HM device with a HA device? Or add a HA device
to a btrfs on an HM device.

We have a choice here: we can treat HA drives as regular devices or treat them
as HM devices. Anything in between does not make sense. I am fine either way,
the main reason being that there are no HA drive on the market today that I know
of (this model did not have a lot of success due to the potentially very
unpredictable performance depending on the use case).

So the simplest thing to do is, in my opinion, to ignore their "zone"
characteristics and treat them as regular disks. But treating them as HM drives
is a simple to do too.
> Of note is that a host-aware drive will be reported by the block layer as
BLK_ZONED_HA only as long as the drive does not have any partition. If it does,
then the block layer will treat the drive as a regular disk.

IMO. For now, it is better to check for the BLK_ZONED_HA explicitly in a 
non-zoned-btrfs. And check for BLK_ZONED_HM explicitly in a zoned-btrfs. 
This way, if there is another type of BLK_ZONED_xx in the future, we 
have the opportunity to review to support it. As below [1]...

[1]
bool btrfs_check_device_type()
{
	if (bdev_is_zoned()) {
		if (btrfs_is_zoned())
			if (bdev_zoned_model == BLK_ZONED_HM)
			/* also check the zone_size. */
				return true;
		else
			if (bdev_zoned_model == BLK_ZONED_HA)
			/* a regular device and FS, no zone_size to check I think? */
				return true;
	} else {
		if (!btrfs_is_zoned())
			return true
	}

	return false;
}

Thanks.



Thanks.


+}


Also, if there is a new type of zoned device in the future, the older
kernel should be able to reject the newer zone device types.

And, if possible could you rename above function to
btrfs_zone_type_is_valid(). Or better.


right? What if in a non-zoned FS, a zoned device is added through the
replace. No?

The types of devices cannot mix, yeah. So I'd like to know the answer as
well.


--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2518,6 +2518,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
     	if (IS_ERR(bdev))
     		return PTR_ERR(bdev);
     
+	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
+		ret = -EINVAL;
+		goto error;
+	}
+
     	if (fs_devices->seeding) {
     		seeding_dev = 1;
     		down_write(&sb->s_umount);

Same here too. It can also happen that a zone device is added to a non
zoned fs.


Thanks.