Re: [PATCH V2 RESEND 4/5] blk-mq: re-submit IO in case that hctx is dead

Hannes Reinecke <hare@xxxxxxx> · Mon, 7 Oct 2019 08:27:38 +0200



On 10/6/19 4:45 AM, Ming Lei wrote:
> When all CPUs in one hctx are offline, we shouldn't run this hw queue
> for completing request any more.
> 
> So steal bios from the request, and resubmit them, and finally free
> the request in blk_mq_hctx_notify_dead().
> 
> Cc: Bart Van Assche <bvanassche@xxxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Keith Busch <keith.busch@xxxxxxxxx>
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> ---
>  block/blk-mq.c | 48 +++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 41 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d991c122abf2..0b35fdbd1f17 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2280,10 +2280,30 @@ static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
>  	return 0;
>  }
>  
> +static void blk_mq_resubmit_io(struct request *rq)
> +{
> +	struct bio_list list;
> +	struct bio *bio;
> +
> +	bio_list_init(&list);
> +	blk_steal_bios(&list, rq);
> +
> +	while (true) {
> +		bio = bio_list_pop(&list);
> +		if (!bio)
> +			break;
> +
> +		generic_make_request(bio);
> +	}
> +
> +	blk_mq_cleanup_rq(rq);
> +	blk_mq_end_request(rq, 0);
> +}
> +
Hmm. Not sure if this is a good idea.
Shouldn't we call 'blk_mq_end_request()' before calling
generic_make_request()?
otherwise the cloned request might be completed before original one,
which looks a bit dodgy to me; and might lead to quite a recursion if we
have several dead cpus to content with ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer