Re: dm-rq: do not update rq partially in each ending bio

Ming Lei <ming.lei@xxxxxxxxxx> · Sat, 26 Aug 2017 01:07:53 +0800

On Fri, Aug 25, 2017 at 12:32:33PM -0400, Mike Snitzer wrote:
> On Fri, Aug 25 2017 at 12:08pm -0400,
> Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> 
> > On Fri, Aug 25, 2017 at 11:48:39AM -0400, Mike Snitzer wrote:
> > > On Fri, Aug 25 2017 at 11:27am -0400,
> > > Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> > > 
> > > > We don't need to update orignal dm request partially
> > > > when ending each cloned bio, and this patch just
> > > > updates orignal dm request once when the whole
> > > > cloned request is finished.
> > > > 
> > > > Partial request update can be a bit expensive, so
> > > > we should try to avoid it, especially it is run
> > > > in softirq context.
> > > > 
> > > > After this patch is applied, both hard lockup and
> > > > soft lockup aren't reproduced any more in one hour
> > > > of running Laurence's test[1] on IB/SRP. Without
> > > > this patch, the lockup can be reproduced in several
> > > > minutes.
> > > > 
> > > > BTW, after d4acf3650c7c(block: Make blk_mq_delay_kick_requeue_list()
> > > > rerun the queue at a quiet time), we need to make the
> > > > test more aggressive for reproducing the lockup:
> > > > 
> > > > 	1) run hammer_write.sh 32 or 64 concurrently.
> > > > 	2) write 8M each time
> > > > 
> > > > [1] https://marc.info/?l=linux-block&m=150220185510245&w=2
> > > 
> > > Bart said he cannot reproduce the lockups with his patchset applied.
> > > Have you tested using Bart's patchset?
> > 
> > d4acf3650c7c(block: Make blk_mq_delay_kick_requeue_list() rerun the
> > queue at a quiet time) has been in linus tree.
> > 
> > For other patches, I didn't test it yet. Because every time
> > when the lockup is triggered, it is always in blk_recalc_rq_segments(),
> > and not see any patch is dealing with that.
> 
> Please test with all of Bart's patches applied!

Just done the test with Bart's patch, still can
see soft lockup when running the test described
in commit log for a couple of minutes. BTW, my
test is much more aggressive than Laurence's, I
write 8M each time, and run 64 hammer_write.sh
concurrently.

I don't think the two are contradictory. Anyway,
this patch will decrease CPU utilization of SOFTIRQ, and
it is a improvement.

> 
> > > Comments inlined below.
> > > 
> > > > ---
> > > >  drivers/md/dm-rq.c | 15 +++++----------
> > > >  drivers/md/dm-rq.h |  1 +
> > > >  2 files changed, 6 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > > > index c6ebc5b1e00e..50cd96c7de45 100644
> > > > --- a/drivers/md/dm-rq.c
> > > > +++ b/drivers/md/dm-rq.c
> > > > @@ -117,9 +117,9 @@ static void end_clone_bio(struct bio *clone)
> > > >  	struct dm_rq_clone_bio_info *info =
> > > >  		container_of(clone, struct dm_rq_clone_bio_info, clone);
> > > >  	struct dm_rq_target_io *tio = info->tio;
> > > > -	struct bio *bio = info->orig;
> > > >  	unsigned int nr_bytes = info->orig->bi_iter.bi_size;
> > > >  	blk_status_t error = clone->bi_status;
> > > > +	bool is_last = !clone->bi_next;
> > > >  
> > > >  	bio_put(clone);
> > > >  
> > > > @@ -144,21 +144,15 @@ static void end_clone_bio(struct bio *clone)
> > > >  	 * I/O for the bio successfully completed.
> > > >  	 * Notice the data completion to the upper layer.
> > > >  	 */
> > > > -
> > > > -	/*
> > > > -	 * bios are processed from the head of the list.
> > > > -	 * So the completing bio should always be rq->bio.
> > > > -	 * If it's not, something wrong is happening.
> > > > -	 */
> > > > -	if (tio->orig->bio != bio)
> > > > -		DMERR("bio completion is going in the middle of the request");
> > > 
> > > Why did you remove this check?
> > 
> > The check isn't valid any more because this patch only update
> > original dm rq in .end_bio() of the last cloned bio.
> 
> Fair enough, so just a side-effect of your all or nothing completion handling.

It is not all or nothing, and still can do partial update like
current way.

> 
> > > 
> > > > +	tio->completed += nr_bytes;
> > > >  
> > > >  	/*
> > > >  	 * Update the original request.
> > > >  	 * Do not use blk_end_request() here, because it may complete
> > > >  	 * the original request before the clone, and break the ordering.
> > > >  	 */
> > > > -	blk_update_request(tio->orig, BLK_STS_OK, nr_bytes);
> > > > +	if (is_last)
> > > > +		blk_update_request(tio->orig, BLK_STS_OK, tio->completed);
> > > >  }
> > > 
> > > Partial completion support is important given the potential for path
> > > failures interrupting requests.  Why do you think it is OK to avoid it
> > > by switching to an all or nothing approach?
> > 
> > If the cloned rq is partially completed, this dm rq is still partially
> > completed. This patch only update dm rq in the last cloned bio's
> > .end_io().
> 
> Which is exactly what we do _not_ want because it removed partial
> completion.  Which is an important feature of DM multipath.  Christoph
> echoed its importance some time ago:
> https://www.redhat.com/archives/dm-devel/2015-May/msg00228.html
> 
> > Also if one middle cloned bio is completed with error, the current
> > implementation doesn't update dm rq any more from that bio, so
> > looks the following patch is consistent with current
> > implementation, what do you think of it?
> 
> Not seeing how.  Yes, the current implementation will not account for a
> part of the request that failed.  That is fine.. the bio failed, so
> nothing to update!
> 
> So no, I don't understand why you've added the 'exit' goto to update the
> request on error.  If the request is to be failed upgrades it'll get

It is for updating the previous completed(successful) bios because
'tio->completed' records the completed/successful bytes, that is
same with the current implementation.

The difference is that this patch calls one blk_update_request()
to do that and it is a whole rq update most of times, but the
current way calls one blk_update_request() for each cloned bio
to do that. Since blk_update_request() on partial completion need
to recalculate segments, and it is very expensive, we should avoid that.

blk_update_request() is called in SOFTIRQ context, that is why
I believe it is the main reason of the lockup, and my test shows
that no lockup any more with this patch.

Hope I explain it clearly, :-)

-- 
Ming