On 1/8/20 3:40 PM, Damien Le Moal wrote: > On 2020/01/08 16:13, Nobody wrote: >> From: Bob Liu <bob.liu@xxxxxxxxxx> >> >> Motivation: >> Now the dm-zoned device mapper target exposes a zoned block device(ZBC) as a >> regular block device by storing metadata and buffering random writes in >> conventional zones. >> This way is not very flexible, there must be enough conventional zones and the >> performance may be constrained. >> By putting metadata(also buffering random writes) in separated device we can get >> more flexibility and potential performance improvement e.g by storing metadata >> in faster device like persistent memory. >> >> This patch try to split the metadata of dm-zoned to an extra block >> device instead of zoned block device itself. >> (Buffering random writes also in the todo list.) >> >> Patch is at the very early stage, just want to receive some feedback about >> this extension. >> Another option is to create an new md-zoned device with separated metadata >> device based on md framework. > > For metadata only, it should not be hard at all to move to another > conventional zone device. It will however be a little more tricky for > conventional zones used for data since dm-zoned assumes that this random > write space is also zoned. Moving this space to a conventional device > requires implementing a zone emulation (fake zones) for the regular > drive, using a zone size that matches the size of sequential zones. > Indeed. I'll try to implement zone emulation next version. > Beyond this, dm-zoned also needs to be changed to accept partial drives > and the dm core code to accept mixing of regular and zoned disks (that > is forbidden now). > Do you mean the check in device_area_is_invalid()? 320 /* 321 * If the target is mapped to zoned block device(s), check 322 * that the zones are not partially mapped. 323 */ 324 if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) { > Another approach worth exploring is stacking dm-zoned as is on top of a > modified dm-linear with the ability to emulate conventional zones on top > of a regular block device (you only need report zones method > implemented). That is much easier to do. We actually hacked something > like that last month to see the performance change and saw a jump from > 56 MB/s to 250 MB/s for write intensive workloads (file creation on > ext4). I am not so warm in this approach though as it modifies dm-linear > and we want to keep it very lean and simple. Modifying dm-zoned to allow > support for a device pair is a better approach I think. > >> >> Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> >> --- >> drivers/md/dm-zoned-metadata.c | 6 +++--- >> drivers/md/dm-zoned-target.c | 15 +++++++++++++-- >> drivers/md/dm-zoned.h | 1 + >> 3 files changed, 17 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c >> index 22b3cb0..89cd554 100644 >> --- a/drivers/md/dm-zoned-metadata.c >> +++ b/drivers/md/dm-zoned-metadata.c >> @@ -439,7 +439,7 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd, >> >> /* Submit read BIO */ >> bio->bi_iter.bi_sector = dmz_blk2sect(block); >> - bio_set_dev(bio, zmd->dev->bdev); >> + bio_set_dev(bio, zmd->dev->meta_bdev); >> bio->bi_private = mblk; >> bio->bi_end_io = dmz_mblock_bio_end_io; >> bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO); >> @@ -593,7 +593,7 @@ static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, >> set_bit(DMZ_META_WRITING, &mblk->state); >> >> bio->bi_iter.bi_sector = dmz_blk2sect(block); >> - bio_set_dev(bio, zmd->dev->bdev); >> + bio_set_dev(bio, zmd->dev->meta_bdev); >> bio->bi_private = mblk; >> bio->bi_end_io = dmz_mblock_bio_end_io; >> bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); >> @@ -620,7 +620,7 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block, >> return -ENOMEM; >> >> bio->bi_iter.bi_sector = dmz_blk2sect(block); >> - bio_set_dev(bio, zmd->dev->bdev); >> + bio_set_dev(bio, zmd->dev->meta_bdev); >> bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO); >> bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0); >> ret = submit_bio_wait(bio); >> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c >> index 70a1063..55c64fe 100644 >> --- a/drivers/md/dm-zoned-target.c >> +++ b/drivers/md/dm-zoned-target.c >> @@ -39,6 +39,7 @@ struct dm_chunk_work { >> */ >> struct dmz_target { >> struct dm_dev *ddev; >> + struct dm_dev *metadata_dev; > > This naming would be confusing as it implies metadata only. If you also > want to move the random write zones to a regular device, then I would > suggest names like: > > ddev -> zoned_bdev (the zoned device, e.g. SMR disk) > metadata_dev -> reg_bdev (regular block device, e.g. an SSD) > Will update. > The metadata+random write (fake) zones space can use the regular block > device, and all sequential zones are assumed to be on the zoned device. > Care must be taken to handle the case of a zoned device that has > conventional zones: these could be used as is, not needing any reclaim, Agree, that won't make things too complicate. Thank you for all the suggestions. Regards, Bob > so potentially contributing to further optimization. > >> >> unsigned long flags; >> >> @@ -701,6 +702,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path) >> goto err; >> } >> >> + dev->meta_bdev = dmz->metadata_dev->bdev; >> dev->bdev = dmz->ddev->bdev; >> (void)bdevname(dev->bdev, dev->name); >> >> @@ -761,7 +763,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> int ret; >> >> /* Check arguments */ >> - if (argc != 1) { >> + if (argc != 2) { >> ti->error = "Invalid argument count"; >> return -EINVAL; >> } >> @@ -774,8 +776,16 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> } >> ti->private = dmz; >> >> + /* Get the metadata block device */ >> + ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), >> + &dmz->metadata_dev); >> + if (ret) { >> + dmz->metadata_dev = NULL; >> + goto err; >> + } >> + >> /* Get the target zoned block device */ >> - ret = dmz_get_zoned_device(ti, argv[0]); >> + ret = dmz_get_zoned_device(ti, argv[1]); >> if (ret) { >> dmz->ddev = NULL; >> goto err; >> @@ -856,6 +866,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv) >> err_dev: >> dmz_put_zoned_device(ti); >> err: >> + dm_put_device(ti, dmz->metadata_dev); >> kfree(dmz); >> >> return ret; >> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h >> index 5b5e493..e789ff5 100644 >> --- a/drivers/md/dm-zoned.h >> +++ b/drivers/md/dm-zoned.h >> @@ -50,6 +50,7 @@ >> */ >> struct dmz_dev { >> struct block_device *bdev; >> + struct block_device *meta_bdev; >> >> char name[BDEVNAME_SIZE]; >> >>