Re: [PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 15 Sep 2017 00:18:54 +0800

On Thu, Sep 14, 2017 at 01:37:14PM +0000, Bart Van Assche wrote:
> On Thu, 2017-09-14 at 09:15 +0800, Ming Lei wrote:
> > On Wed, Sep 13, 2017 at 07:07:53PM +0000, Bart Van Assche wrote:
> > > On Thu, 2017-09-14 at 01:48 +0800, Ming Lei wrote:
> > > > No, that patch only changes blk_insert_cloned_request() which is used
> > > > by dm-rq(mpath) only, nothing to do with the reported issue during
> > > > suspend and sending SCSI Domain validation.
> > > 
> > > There may be other ways to fix the SCSI domain validation code.
> > 
> > Again the issue isn't in domain validation, it is in quiesce,
> > so we need to fix quiesce, instead of working around transport_spi.
> > 
> > Also What is the other way? Why not this patchset?
> 
> Sorry if I had not made this clear enough but I don't like the approach of
> this patch series so please do not expect any "Reviewed-by" tags from me.
> As the discussion about v4 of this patch series made clear the interaction
> between blk_cleanup_queue() and the changes introduced by this patch series
> in blk_get_request() is subtle and hard to analyze. The blk-mq core is

No, it isn't subtle at all, as I explained, queue dying can be
set during allocating request in both legacy and blk-mq, and driver
is required to handle requests after queue becomes dying, this way
has been there for long time.

Is that really hard to analyze?

> already complicated. In my view patches that make the blk-mq core simpler
> are much more welcome than patches that make the blk-mq core more
> complicated.

Sorry, I can't agree this patchset is too complicated, this patchset just
touches quiesce interface. For other change such as holding queue usage
counter, it follows blk-mq's way, and we can reuse this way for
legacy too.

> 
> Since I expect that any fix for the interaction between blk-mq and power
> management will be integrated in kernel v4.15 at earliest there is no reason

Again, it isn't not related PM only, it is actually related with
SCSI quiesce.

> to rush. My proposal is to wait a few weeks and to see whether anyone comes
> up with a better solution.

I am open for any solution and happy to review them if someone posts
them out, but it should cover at least the two kind of reported issues.

However I won't wait for that, since people have been troubled with this
stuff much, like Oleksandr's case, the system is simple dead after
one susend. And the I/O hang in sending SCSI domain validation was
actually reported from a production system too.

-- 
Ming