On 1/9/20 6:46 AM, Dmitry Fomichev wrote: >> -----Original Message----- >> From: linux-block-owner@xxxxxxxxxxxxxxx <linux-block- >> owner@xxxxxxxxxxxxxxx> On Behalf Of Bob Liu >> Sent: Wednesday, January 8, 2020 8:46 AM >> To: Damien Le Moal <Damien.LeMoal@xxxxxxx>; dm-devel@xxxxxxxxxx >> Cc: linux-block@xxxxxxxxxxxxxxx; snitzer@xxxxxxxxxx; Dmitry Fomichev >> <Dmitry.Fomichev@xxxxxxx>; martin.petersen@xxxxxxxxxx; >> shirley.ma@xxxxxxxxxx >> Subject: Re: [RFC PATCH] dm-zoned: extend the way of exposing zoned >> block device >> >> On 1/8/20 3:40 PM, Damien Le Moal wrote: >>> On 2020/01/08 16:13, Nobody wrote: >>>> From: Bob Liu <bob.liu@xxxxxxxxxx> >>>> >>>> Motivation: >>>> Now the dm-zoned device mapper target exposes a zoned block >> device(ZBC) as a >>>> regular block device by storing metadata and buffering random writes in >>>> conventional zones. >>>> This way is not very flexible, there must be enough conventional zones >> and the >>>> performance may be constrained. >>>> By putting metadata(also buffering random writes) in separated device >> we can get >>>> more flexibility and potential performance improvement e.g by storing >> metadata >>>> in faster device like persistent memory. >>>> >>>> This patch try to split the metadata of dm-zoned to an extra block >>>> device instead of zoned block device itself. >>>> (Buffering random writes also in the todo list.) >>>> >>>> Patch is at the very early stage, just want to receive some feedback about >>>> this extension. >>>> Another option is to create an new md-zoned device with separated >> metadata >>>> device based on md framework. >>> >>> For metadata only, it should not be hard at all to move to another >>> conventional zone device. It will however be a little more tricky for >>> conventional zones used for data since dm-zoned assumes that this >> random >>> write space is also zoned. Moving this space to a conventional device >>> requires implementing a zone emulation (fake zones) for the regular >>> drive, using a zone size that matches the size of sequential zones. >>> >> >> Indeed. >> I'll try to implement zone emulation next version. >> >>> Beyond this, dm-zoned also needs to be changed to accept partial drives >>> and the dm core code to accept mixing of regular and zoned disks (that >>> is forbidden now). >>> >> >> Do you mean the check in device_area_is_invalid()? >> >> 320 /* >> 321 * If the target is mapped to zoned block device(s), check >> 322 * that the zones are not partially mapped. >> 323 */ >> 324 if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) { >> >>> Another approach worth exploring is stacking dm-zoned as is on top of a >>> modified dm-linear with the ability to emulate conventional zones on top >>> of a regular block device (you only need report zones method >>> implemented). That is much easier to do. We actually hacked something >>> like that last month to see the performance change and saw a jump from >>> 56 MB/s to 250 MB/s for write intensive workloads (file creation on >>> ext4). I am not so warm in this approach though as it modifies dm-linear >>> and we want to keep it very lean and simple. Modifying dm-zoned to allow >>> support for a device pair is a better approach I think. >>> >>>> >>>> Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> >>>> --- >>>> drivers/md/dm-zoned-metadata.c | 6 +++--- >>>> drivers/md/dm-zoned-target.c | 15 +++++++++++++-- >>>> drivers/md/dm-zoned.h | 1 + >>>> 3 files changed, 17 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned- >> metadata.c >>>> index 22b3cb0..89cd554 100644 >>>> --- a/drivers/md/dm-zoned-metadata.c >>>> +++ b/drivers/md/dm-zoned-metadata.c >>>> @@ -439,7 +439,7 @@ static struct dmz_mblock >> *dmz_get_mblock_slow(struct dmz_metadata *zmd, >>>> >>>> /* Submit read BIO */ >>>> bio->bi_iter.bi_sector = dmz_blk2sect(block); >>>> - bio_set_dev(bio, zmd->dev->bdev); >>>> + bio_set_dev(bio, zmd->dev->meta_bdev); >>>> bio->bi_private = mblk; >>>> bio->bi_end_io = dmz_mblock_bio_end_io; >>>> bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO); >>>> @@ -593,7 +593,7 @@ static int dmz_write_mblock(struct dmz_metadata >> *zmd, struct dmz_mblock *mblk, >>>> set_bit(DMZ_META_WRITING, &mblk->state); >>>> >>>> bio->bi_iter.bi_sector = dmz_blk2sect(block); >>>> - bio_set_dev(bio, zmd->dev->bdev); >>>> + bio_set_dev(bio, zmd->dev->meta_bdev); >>>> bio->bi_private = mblk; >>>> bio->bi_end_io = dmz_mblock_bio_end_io; >>>> bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); >>>> @@ -620,7 +620,7 @@ static int dmz_rdwr_block(struct dmz_metadata >> *zmd, int op, sector_t block, >>>> return -ENOMEM; >>>> >>>> bio->bi_iter.bi_sector = dmz_blk2sect(block); >>>> - bio_set_dev(bio, zmd->dev->bdev); >>>> + bio_set_dev(bio, zmd->dev->meta_bdev); >>>> bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO); >>>> bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0); >>>> ret = submit_bio_wait(bio); >>>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned- >> target.c >>>> index 70a1063..55c64fe 100644 >>>> --- a/drivers/md/dm-zoned-target.c >>>> +++ b/drivers/md/dm-zoned-target.c >>>> @@ -39,6 +39,7 @@ struct dm_chunk_work { >>>> */ >>>> struct dmz_target { >>>> struct dm_dev *ddev; >>>> + struct dm_dev *metadata_dev; >>> >>> This naming would be confusing as it implies metadata only. If you also >>> want to move the random write zones to a regular device, then I would >>> suggest names like: >>> >>> ddev -> zoned_bdev (the zoned device, e.g. SMR disk) >>> metadata_dev -> reg_bdev (regular block device, e.g. an SSD) >>> >> >> Will update. >> >>> The metadata+random write (fake) zones space can use the regular block >>> device, and all sequential zones are assumed to be on the zoned device. >>> Care must be taken to handle the case of a zoned device that has >>> conventional zones: these could be used as is, not needing any reclaim, >> >> Agree, that won't make things too complicate. >> Thank you for all the suggestions. >> >> Regards, >> Bob >> >>> so potentially contributing to further optimization. >>> >>>> >>>> unsigned long flags; >>>> >>>> @@ -701,6 +702,7 @@ static int dmz_get_zoned_device(struct dm_target >> *ti, char *path) >>>> goto err; >>>> } >>>> >>>> + dev->meta_bdev = dmz->metadata_dev->bdev; >>>> dev->bdev = dmz->ddev->bdev; >>>> (void)bdevname(dev->bdev, dev->name); >>>> >>>> @@ -761,7 +763,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned >> int argc, char **argv) >>>> int ret; >>>> >>>> /* Check arguments */ >>>> - if (argc != 1) { >>>> + if (argc != 2) { >>>> ti->error = "Invalid argument count"; >>>> return -EINVAL; >>>> } > > I like the idea to have an additional device in dm-zoned as the source of random > zones as opposed to using block range concatenation via dm-linear. But, > shouldn't we retain the possibility to use just the conventional zones that exist > on the drive? This seems necessary for backwards compatibility. In the code > above, if the arg count is one, the processing probably should fall back to the > existing way of using drive's conventional zones. > Yeah, that's better. Will update. Thanks! -Bob >>>> @@ -774,8 +776,16 @@ static int dmz_ctr(struct dm_target *ti, unsigned >> int argc, char **argv) >>>> } >>>> ti->private = dmz; >>>> >>>> + /* Get the metadata block device */ >>>> + ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), >>>> + &dmz->metadata_dev); >>>> + if (ret) { >>>> + dmz->metadata_dev = NULL; >>>> + goto err; >>>> + } >>>> + >>>> /* Get the target zoned block device */ >>>> - ret = dmz_get_zoned_device(ti, argv[0]); >>>> + ret = dmz_get_zoned_device(ti, argv[1]); >>>> if (ret) { >>>> dmz->ddev = NULL; >>>> goto err; >>>> @@ -856,6 +866,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned >> int argc, char **argv) >>>> err_dev: >>>> dmz_put_zoned_device(ti); >>>> err: >>>> + dm_put_device(ti, dmz->metadata_dev); >>>> kfree(dmz); >>>> >>>> return ret; >>>> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h >>>> index 5b5e493..e789ff5 100644 >>>> --- a/drivers/md/dm-zoned.h >>>> +++ b/drivers/md/dm-zoned.h >>>> @@ -50,6 +50,7 @@ >>>> */ >>>> struct dmz_dev { >>>> struct block_device *bdev; >>>> + struct block_device *meta_bdev; >>>> >>>> char name[BDEVNAME_SIZE]; >>>> >>>> >