On 3/17/21 10:54 AM, Ming Lei wrote: > On Tue, Mar 16, 2021 at 04:52:36PM +0800, JeffleXu wrote: >> >> >> On 3/16/21 3:17 PM, Ming Lei wrote: >>> On Tue, Mar 16, 2021 at 02:46:08PM +0800, JeffleXu wrote: >>>> It is a giant progress to gather all split bios that need to be polled >>>> in a per-task queue. Still some comments below. >>>> >>>> >>>> On 3/16/21 11:15 AM, Ming Lei wrote: >>>>> Currently bio based IO poll needs to poll all hw queue blindly, this way >>>>> is very inefficient, and the big reason is that we can't pass bio >>>>> submission result to io poll task. >>>>> >>>>> In IO submission context, store associated underlying bios into the >>>>> submission queue and save 'cookie' poll data in bio->bi_iter.bi_private_data, >>>>> and return current->pid to caller of submit_bio() for any DM or bio based >>>>> driver's IO, which is submitted from FS. >>>>> >>>>> In IO poll context, the passed cookie tells us the PID of submission >>>>> context, and we can find the bio from that submission context. Moving >>>>> bio from submission queue to poll queue of the poll context, and keep >>>>> polling until these bios are ended. Remove bio from poll queue if the >>>>> bio is ended. Add BIO_DONE and BIO_END_BY_POLL for such purpose. >>>>> >>>>> Usually submission shares context with io poll. The per-task poll context >>>>> is just like stack variable, and it is cheap to move data between the two >>>>> per-task queues. >>>>> >>>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> >>>>> --- >>>>> block/bio.c | 5 ++ >>>>> block/blk-core.c | 74 +++++++++++++++++- >>>>> block/blk-mq.c | 156 +++++++++++++++++++++++++++++++++++++- >>>>> include/linux/blk_types.h | 3 + >>>>> 4 files changed, 235 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/block/bio.c b/block/bio.c >>>>> index a1c4d2900c7a..bcf5eca0e8e3 100644 >>>>> --- a/block/bio.c >>>>> +++ b/block/bio.c >>>>> @@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio) >>>>> **/ >>>>> void bio_endio(struct bio *bio) >>>>> { >>>>> + /* BIO_END_BY_POLL has to be set before calling submit_bio */ >>>>> + if (bio_flagged(bio, BIO_END_BY_POLL)) { >>>>> + bio_set_flag(bio, BIO_DONE); >>>>> + return; >>>>> + } >>>>> again: >>>>> if (!bio_remaining_done(bio)) >>>>> return; >>>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>>> index a082bbc856fb..970b23fa2e6e 100644 >>>>> --- a/block/blk-core.c >>>>> +++ b/block/blk-core.c >>>>> @@ -854,6 +854,40 @@ static inline void blk_bio_poll_preprocess(struct request_queue *q, >>>>> bio->bi_opf |= REQ_TAG; >>>>> } >>>>> >>>>> +static bool blk_bio_poll_prep_submit(struct io_context *ioc, struct bio *bio) >>>>> +{ >>>>> + struct blk_bio_poll_data data = { >>>>> + .bio = bio, >>>>> + }; >>>>> + struct blk_bio_poll_ctx *pc = ioc->data; >>>>> + unsigned int queued; >>>>> + >>>>> + /* lock is required if there is more than one writer */ >>>>> + if (unlikely(atomic_read(&ioc->nr_tasks) > 1)) { >>>>> + spin_lock(&pc->lock); >>>>> + queued = kfifo_put(&pc->sq, data); >>>>> + spin_unlock(&pc->lock); >>>>> + } else { >>>>> + queued = kfifo_put(&pc->sq, data); >>>>> + } >>>>> + >>>>> + /* >>>>> + * Now the bio is added per-task fifo, mark it as END_BY_POLL, >>>>> + * so we can save cookie into this bio after submit_bio(). >>>>> + */ >>>>> + if (queued) >>>>> + bio_set_flag(bio, BIO_END_BY_POLL); >>>>> + else >>>>> + bio->bi_opf &= ~(REQ_HIPRI | REQ_TAG); >>>>> + >>>>> + return queued; >>>>> +} >>>> >>>> The size of kfifo is limited, and it seems that once the sq of kfifio is >>>> full, REQ_HIPRI flag is cleared and the corresponding bio is actually >>>> enqueued into the default hw queue, which is IRQ driven. >>> >>> Yeah, this patch starts with 64 queue depth, and we can increase it to >>> 128, which should cover most of cases. >> >> It seems that the queue depth of kfifo will affect the performance as I >> did a fast test. >> >> >> >> Test Result: >> >> BLK_BIO_POLL_SQ_SZ | iodepth | IOPS >> ------------------ | ------- | ---- >> 64 | 128 | 301k (IRQ) -> 340k (iopoll) >> 64 | 16 | 304k (IRQ) -> 392k (iopoll) >> 128 | 128 | 204k (IRQ) -> 317k (iopoll) >> 256 | 128 | 241k (IRQ) -> 391k (iopoll) >> >> It seems that BLK_BIO_POLL_SQ_SZ need to be increased accordingly when >> iodepth is quite large. But I don't know why the performance in IRQ mode >> decreases when BLK_BIO_POLL_SQ_SZ is increased. > > This patchset is supposed to not affect IRQ mode because HIPRI isn't set > at IRQ mode. Or you mean '--hipri' & io_uring is setup but setting > nvme.poll_queues as 0 at your 'IRQ' mode test? > > Thanks for starting to run performance test, and so far I just run test > in KVM, not start performance test yet. > 'IRQ' means 'hipri=0' of fio configuration. The above performance test was performed on one x86 machine with one single nvme disk. I did the test on another aarch64 machine with more nvme disks, showing that this performance drop didn't occure... Please see my reply in another thread for detailed test results. -- Thanks, Jeffle