On 9/24/22 10:01, Jens Axboe wrote: > On 9/23/22 6:59 PM, Damien Le Moal wrote: >> On 9/24/22 05:54, Jens Axboe wrote: >>> On 9/23/22 9:13 AM, Pankaj Raghav wrote: >>>> On 2022-09-23 16:52, Pankaj Raghav wrote: >>>>> On Thu, Sep 22, 2022 at 12:28:01PM -0600, Jens Axboe wrote: >>>>>> The filesystem IO path can take advantage of allocating batches of >>>>>> requests, if the underlying submitter tells the block layer about it >>>>>> through the blk_plug. For passthrough IO, the exported API is the >>>>>> blk_mq_alloc_request() helper, and that one does not allow for >>>>>> request caching. >>>>>> >>>>>> Wire up request caching for blk_mq_alloc_request(), which is generally >>>>>> done without having a bio available upfront. >>>>>> >>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>> --- >>>>>> block/blk-mq.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ >>>>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>>>> >>>>> I think we need this patch to ensure correct behaviour for passthrough: >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index c11949d66163..840541c1ab40 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -1213,7 +1213,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head) >>>>> WARN_ON(!blk_rq_is_passthrough(rq)); >>>>> >>>>> blk_account_io_start(rq); >>>>> - if (current->plug) >>>>> + if (blk_mq_plug(rq->bio)) >>>>> blk_add_rq_to_plug(current->plug, rq); >>>>> else >>>>> blk_mq_sched_insert_request(rq, at_head, true, false); >>>>> >>>>> As the passthrough path can now support request caching via blk_mq_alloc_request(), >>>>> and it uses blk_execute_rq_nowait(), bad things can happen at least for zoned >>>>> devices: >>>>> >>>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>>> { >>>>> /* Zoned block device write operation case: do not plug the BIO */ >>>>> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio))) >>>>> return NULL; >>>>> .. >>>> >>>> Thinking more about it, even this will not fix it because op is >>>> REQ_OP_DRV_OUT if it is a NVMe write for passthrough requests. >>>> >>>> @Damien Should the condition in blk_mq_plug() be changed to: >>>> >>>> static inline struct blk_plug *blk_mq_plug( struct bio *bio) >>>> { >>>> /* Zoned block device write operation case: do not plug the BIO */ >>>> if (bdev_is_zoned(bio->bi_bdev) && !op_is_read(bio_op(bio))) >>>> return NULL; >>> >>> That looks reasonable to me. It'll prevent plug optimizations even >>> for passthrough on zoned devices, but that's probably fine. >> >> Could do: >> >> if (blk_op_is_passthrough(bio_op(bio)) || >> (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))) >> return NULL; >> >> Which I think is way cleaner. No ? >> Unless you want to preserve plugging with passthrough commands on regular >> (not zoned) drives ? > > We most certainly do, without plugging this whole patchset is not > functional. Nor is batched dispatch, for example. OK. Then the change to !op_is_read() is fine then. > -- Damien Le Moal Western Digital Research