Re: [RFC 0/3] Add support of iopoll for dm device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 20 2020 at  2:54am -0400,
Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:

> This patch set adds support of iopoll for dm device.
> 
> This is only an RFC patch. I'm really looking forward getting your
> feedbacks on if you're interested in supporting iopoll for dm device,
> or if there's a better design to implement that.
> 
> Thanks.
> 
> 
> [Purpose]
> IO polling is an important mode of io_uring. Currently only mq devices
> support iopoll. As for dm devices, only dm-multipath is request-base,
> while others are all bio-based and have no support for iopoll.
> Supporting iopoll for dm devices can be of great meaning when the
> device seen by application is dm device such as dm-linear/dm-stripe,
> in which case iopoll is not usable for io_uring.

I appreciate you taking the initiative on this; polling support is on my
TODO so your work serves as a nice reminder to pursue this more
urgently.

> [Design Notes]
> 
> cookie
> ------
> Let's start from cookie. Cookie is one important concept in iopoll. It
> is used to identify one specific request in one specific hardware queue.
> The concept of cookie is initially designed as a per-bio concept, and
> thus it doesn't work well when bio-split involved. When bio is split,
> the returned cookie is indeed the cookie of one of the split bio, and
> the following polling on this returned cookie can only guarantee the
> completion of this specific split bio, while the other split bios may
> be still uncompleted. Bio-split is also resolved for dm device, though
> in another form, in which case the original bio submitted to the dm
> device may be split into multiple bios submitted to the underlying
> devices.
> 
> In previous discussion, Lei Ming has suggested that iopoll should be
> disabled for bio-split. This works for the normal bio-split (done in
> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
> usable for dm devices if this also applies for dm device.
> 
> So come back to the design of the cookie for dm devices. At the very
> beginning I want to refactor the design of cookie, from 'unsigned int'
> type to the pointer type for dm device, so that cookie can point to
> something, e.g. a list containing all cookies of one original bio,
> something like this:
> 
> struct dm_io { // the returned cookie points to dm_io
> 	...
> 	struct list_head cookies; 
> };
> 
> In this design, we can fetch all cookies of one original bio, but the
> implementation can be difficult and buggy. For example, the
> 'struct dm_io' structure may be already freed when the returned cookie
> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
> so that 'struct dm_io' structure can not be freed until blk_poll()
> called? Then the 'struct dm_io' structure will never be freed if the
> IO polling is not used at all.

I'd have to look closer at the race in the code you wrote (though you
didn't share it); but we cannot avoid properly mapping a cookie to each
split bio.  Otherwise you resort to inefficiently polling everything.

Seems your attempt to have the cookie point to a dm_io object was likely
too coarse-grained (when bios are split they get their own dm_io on
recursive re-entry to DM core from ->submit_bio); but isn't having a
list of cookies still too imprecise for polling purposes?  You could
easily have a list that translates to many blk-mq queues.  Possibly
better than your current approach of polling everything -- but not
much.

> So finally I gived up refactoring the design of cookie and maintain all
> cookies of one original bio. The initial design of cookie gets retained.
> The returned cookie is still the cookie of one of the split bio, and thus
> it is not used at all when polling dm devices. The polling of dm device,
> is indeed iterating and polling on all hardware queues (in polling mode)
> of all underlying target devices.
>
> This design is simple and easy to implement. But I'm not sure if the
> performance can be an issue if one dm device mapped to too many target
> devices or the dm stack depth grows.

You showed 40% performance improvement but it really just does the bare
minimum of implementing polling.  It is so simplistic that I really
don't think the design even serves as a starting point for what will be
needed.  So this needs further design time.

What you've _done_ could serve as a stop-gap but I'd really rather we
get it properly designed from the start.

> REQ_NOWAIT
> ----------
> The polling bio will set REQ_NOWAIT in bio->bi_flags, and the device
> need to be capable of handling REQ_NOWAIT. Commit 021a24460dc2
> ("block: add QUEUE_FLAG_NOWAIT") and commit 6abc49468eea ("dm: add
> support for REQ_NOWAIT and enable it for linear target") add this
> support for dm device, and currently only dm-linear supports that.
> 
> hybrid polling
> --------------
> When execute hybrid polling, cookie is used to fetch the request,
> and check if the request has completed or not. In the design for
> dm device described above, the returned cookie is still the cookie
> of one of the split bios, and thus we have no way checking if all
> the split bios have completed or not. Thus in the current
> implementation, hybrid polling is not supported for dm device.

I'll look closer at all this.  But DM targets do allow for additional
target specific per-bio-data to be added to the overall per-bio-data
size (via ti->per_io_data_size).  DM core _could_ internalize adding
additional space to per-bio-data for storing a unique cookie per
split/clone bio.

Conversely, block-core's bio cloning could manage this so long as it
knew how to access the offset into the established per-bio-data.  But
that might be more challenging to do without impacting use-cases outside
of DM.

Thanks,
Mike


> [Performance]
> I test 4k-randread on a dm-linear mapped to only one nvme device.
> 
> SSD: INTEL SSDPEDME800G4
> dm-linear:dmsetup create testdev --table "0 209715200 linear /dev/nvme0n1 0"
> 
> fio configs:
> ```
> ioengine=io_uring
> iodepth=128
> numjobs=1
> thread
> rw=randread
> direct=1
> registerfiles=1
> #hipri=1 with iopoll enabled, hipri=0 with iopoll disabled
> bs=4k
> size=100G
> runtime=60
> time_based
> group_reporting
> randrepeat=0
> 
> [device]
> filename=/dev/mapper/testdev
> ```
> 
> The test result shows that there's ~40% performance growth when iopoll
> enabled.
> 
> test | iops | bw | avg lat
> ---- | ---- | ---- | ----
> iopoll-disabled | 244k | 953MiB/s  | 524us
> iopoll-enabled  | 345k | 1349MiB/s | 370us
> 
> [TODO]
> The store method of "io_poll"/"io_poll_delay" has not been adapted
> for dm device.
> 
> 
> Jeffle Xu (3):
>   block/mq: add iterator for polling hw queues
>   block: add back ->poll_fn in request queue
>   dm: add support for IO polling
> 
>  block/blk-mq.c         | 30 ++++++++++++++++++++++++------
>  drivers/md/dm-core.h   |  1 +
>  drivers/md/dm-table.c  | 30 ++++++++++++++++++++++++++++++
>  drivers/md/dm.c        | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/blk-mq.h |  6 ++++++
>  include/linux/blkdev.h |  3 +++
>  6 files changed, 104 insertions(+), 6 deletions(-)
> 
> -- 
> 2.27.0
> 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux