On Mon, May 23, 2022 at 09:02:39PM -0600, Keith Busch wrote: > On Tue, May 24, 2022 at 08:40:46AM +0800, Ming Lei wrote: > > On Mon, May 23, 2022 at 04:36:04PM -0600, Keith Busch wrote: > > > On Wed, Apr 20, 2022 at 10:31:10PM +0800, Ming Lei wrote: > > > > So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed > > > > from userspace sync io interface, then block layer tries to poll until > > > > the bio is completed. But the current implementation calls > > > > blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or > > > > timeout easily. > > > > > > Wait a second. The task's current state is TASK_RUNNING when bio_poll() returns > > > zero, so calling blk_io_schedule() isn't supposed to hang. > > > > void __sched io_schedule(void) > > { > > int token; > > > > token = io_schedule_prepare(); > > schedule(); > > io_schedule_finish(token); > > } > > > > But who can wakeup this task after scheduling out? There can't be irq > > handler for POLLED request. > > No one. If everything was working, the task state would be RUNNING, so it is > immediately available to be scheduled back in. > > > The hang can be triggered on nvme/qemu reliably: > > And clearly it's not working, but for a different reason. The polling thread > sees an invalid cookie and fails to set the task back to RUNNING, so yes, it > will sleep with no waker in the current code. > > We usually expect the cookie to be set inline with submit_bio(), but we're not > guaranteed it won't be punted off to .run_work for a variety of reasons, so > the thread writing the cookie may be racing with the reader. > > This was fine before the bio polling support since the cookie was always > returned with submit_bio() before that. > > And I would like psync to continue working with polling. As great as io_uring > is, it's just not as efficient @qd1. > > Here's a bandaid, though I assume it'll break something... > > --- > diff --git a/block/blk-mq.c b/block/blk-mq.c > index ed1869a305c4..348136dc7ba9 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1146,8 +1146,6 @@ void blk_mq_start_request(struct request *rq) > if (blk_integrity_rq(rq) && req_op(rq) == REQ_OP_WRITE) > q->integrity.profile->prepare_fn(rq); > #endif > - if (rq->bio && rq->bio->bi_opf & REQ_POLLED) > - WRITE_ONCE(rq->bio->bi_cookie, blk_rq_to_qc(rq)); > } > EXPORT_SYMBOL(blk_mq_start_request); > > @@ -2464,6 +2462,9 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio, > WARN_ON_ONCE(err); > > blk_account_io_start(rq); > + > + if (rq->bio->bi_opf & REQ_POLLED) > + WRITE_ONCE(rq->bio->bi_cookie, blk_rq_to_qc(rq)); > } Yeah, this way may improve the situation, but still can't fix all, what if the submitted bio is merged to another request in plug list? Thanks, Ming