Re: [PATCH v2 2/2] blk-mq: Rerun dispatching in the case of budget contention

Doug Anderson <dianders@xxxxxxxxxxxx> · Fri, 3 Apr 2020 08:49:54 -0700

Hi,

On Fri, Apr 3, 2020 at 8:10 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Correct that it only happens with BFQ, but whether it's a BFQ bug or
> not just depends on how you define the has_work() API.  If has_work()
> is allowed to be in-exact then it's either a blk-mq bug or a SCSI bug
> depending on how you cut it.  If has_work() must be exact then it is
> certainly a BFQ bug.  If has_work() doesn't need to be exact then it's
> not a BFQ bug.  I believe that a sane API could be defined either way.
> Either has_work() can be defined as a lightweight hint to trigger
> heavier code or it can be defined as something exact.  It's really up
> to blk-mq to say how they define it.
>
> From his response on the SCSI patch [1], it sounded like Jens was OK
> with has_work() being a lightweight hint as long as BFQ ensures that
> the queues run later.  ...but, as my investigation found, I believe
> that BFQ _does_ try to ensure that the queue is run at a later time by
> calling blk_mq_run_hw_queues().  The irony is that due to the race
> we're talking about here, blk_mq_run_hw_queues() isn't guaranteed to
> be reliable if has_work() is inexact.  :(  One way to address this is
> to make blk_mq_run_hw_queues() reliable even if has_work() is inexact.
>
> ...so Jens: care to clarify how you'd like has_work() to be defined?

Sorry to reply so quickly after my prior reply, but I might have found
an extreme corner case where we can still run into the same race even
if has_work() is exact.  This is all theoretical from code analysis.
Maybe you can poke a hole in my scenario or tell me it's so
implausible that we don't care, but it seems like it's theoretically
possible.  For this example I'll assume a budget of 1 (AKA only one
thread can get budget for a given queue):

* Threads A and B both run has_work() at the same time with the same
  "hctx".  has_work() is exact but there's no lock, so it's OK if
  Thread A and B both get back true.

* Thread B gets interrupted for a long time right after it decides
  that there is work.  Maybe its CPU gets an interrupt and the
  interrupt handler is slow.

* Thread A runs, get budget, dispatches work.

* Thread A's work finishes and budget is released.

* Thread B finally runs again and gets budget.

* Since Thread A already took care of the work and no new work has
  come in, Thread B will get NULL from dispatch_request().  I believe
  this is specifically why dispatch_request() is allowed to return
  NULL in the first place if has_work() must be exact.

* Thread B will now be holding the budget and is about to call
  put_budget(), but hasn't called it yet.

* Thread B gets interrupted for a long time (again).  Dang interrupts.

* Now Thread C (with a different "hctx" but the same queue) comes
  along and runs blk_mq_do_dispatch_sched().

* Thread C won't do anything because it can't get budget.

* Finally Thread B will run again and put the budget without kicking
  any queues.

Now we have a potential I/O stall because nobody will ever kick the
queues.

I think the above example could happen even on non-BFQ systems and I
think it would also be fixed by an approach like the one in my patch.

-Doug