Hi, On Fri, Apr 3, 2020 at 8:10 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > Correct that it only happens with BFQ, but whether it's a BFQ bug or > not just depends on how you define the has_work() API. If has_work() > is allowed to be in-exact then it's either a blk-mq bug or a SCSI bug > depending on how you cut it. If has_work() must be exact then it is > certainly a BFQ bug. If has_work() doesn't need to be exact then it's > not a BFQ bug. I believe that a sane API could be defined either way. > Either has_work() can be defined as a lightweight hint to trigger > heavier code or it can be defined as something exact. It's really up > to blk-mq to say how they define it. > > From his response on the SCSI patch [1], it sounded like Jens was OK > with has_work() being a lightweight hint as long as BFQ ensures that > the queues run later. ...but, as my investigation found, I believe > that BFQ _does_ try to ensure that the queue is run at a later time by > calling blk_mq_run_hw_queues(). The irony is that due to the race > we're talking about here, blk_mq_run_hw_queues() isn't guaranteed to > be reliable if has_work() is inexact. :( One way to address this is > to make blk_mq_run_hw_queues() reliable even if has_work() is inexact. > > ...so Jens: care to clarify how you'd like has_work() to be defined? Sorry to reply so quickly after my prior reply, but I might have found an extreme corner case where we can still run into the same race even if has_work() is exact. This is all theoretical from code analysis. Maybe you can poke a hole in my scenario or tell me it's so implausible that we don't care, but it seems like it's theoretically possible. For this example I'll assume a budget of 1 (AKA only one thread can get budget for a given queue): * Threads A and B both run has_work() at the same time with the same "hctx". has_work() is exact but there's no lock, so it's OK if Thread A and B both get back true. * Thread B gets interrupted for a long time right after it decides that there is work. Maybe its CPU gets an interrupt and the interrupt handler is slow. * Thread A runs, get budget, dispatches work. * Thread A's work finishes and budget is released. * Thread B finally runs again and gets budget. * Since Thread A already took care of the work and no new work has come in, Thread B will get NULL from dispatch_request(). I believe this is specifically why dispatch_request() is allowed to return NULL in the first place if has_work() must be exact. * Thread B will now be holding the budget and is about to call put_budget(), but hasn't called it yet. * Thread B gets interrupted for a long time (again). Dang interrupts. * Now Thread C (with a different "hctx" but the same queue) comes along and runs blk_mq_do_dispatch_sched(). * Thread C won't do anything because it can't get budget. * Finally Thread B will run again and put the budget without kicking any queues. Now we have a potential I/O stall because nobody will ever kick the queues. I think the above example could happen even on non-BFQ systems and I think it would also be fixed by an approach like the one in my patch. -Doug