Re: [PATCH v3 4/8] scsi: call scsi_stop_queue() without state_mutex held

Martin Wilck <mwilck@xxxxxxxx> · Mon, 12 Jun 2023 13:15:22 +0200

On Thu, 2023-06-08 at 13:54 -0500, Mike Christie wrote:
> On 6/8/23 9:12 AM, Bart Van Assche wrote:
> > On 6/7/23 22:44, Christoph Hellwig wrote:
> > > > > Thanks. This wasn't obvious to me from the current code. I'll
> > > > > add a
> > > > > comment in the next version.
> > > > 
> > > > The crucial question is now, is it sufficient to call
> > > > blk_mq_quiesce_queue_nowait() under the mutex, or does the call
> > > > to
> > > > blk_mq_wait_quiesce_done() have to be under the mutex, too?
> > > > The latter would actually kill off our attempt to fix the delay
> > > > in fc_remote_port_delete() that was caused by repeated
> > > > synchronize_rcu() calls.
> > > > 
> > > > But if I understand you correctly, moving the wait out of the
> > > > mutex
> > > > should be ok. I'll update the series accordingly.
> > > 
> > > I can't think of a reason we'd want to lock over the wait, but
> > > Bart
> > > knows this code way better than I do.
> > 
> > Unless deep changes would be made in the block layer
> > blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done()
> > functions, moving blk_mq_wait_quiesce_done() outside the critical
> > section seems fine to me.
> 
> If it helps, I tested with iscsi and the mutex changes discussed
> above
> and it worked ok for me.

I guess the race that Bart was hinting at is hard to trigger.

I would like to remark that the fact that we need to hold the SCSI
state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a
layering issue to me. Not sure if, and how, this could be avoided,
though.

Regards
Martin