RE: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to MQ

Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> · Fri, 1 May 2020 00:48:46 +0530

> -----Original Message-----
> From: linux-scsi-owner@xxxxxxxxxxxxxxx [mailto:linux-scsi-
> owner@xxxxxxxxxxxxxxx] On Behalf Of John Garry
> Sent: Thursday, April 30, 2020 11:11 PM
> To: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx>
> Cc: axboe@xxxxxxxxx; jejb@xxxxxxxxxxxxx; martin.petersen@xxxxxxxxxx;
> ming.lei@xxxxxxxxxx; bvanassche@xxxxxxx; hare@xxxxxxx;
> don.brace@xxxxxxxxxxxxx; Sumit Saxena <sumit.saxena@xxxxxxxxxxxx>;
> hch@xxxxxxxxxxxxx; Shivasharan Srikanteshwara
> <shivasharan.srikanteshwara@xxxxxxxxxxxx>; chenxiang (M)
> <chenxiang66@xxxxxxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-
> scsi@xxxxxxxxxxxxxxx; esc.storagedev@xxxxxxxxxxxxx; Hannes Reinecke
> <hare@xxxxxxxx>
> Subject: Re: [PATCH RFC v6 08/10] megaraid_sas: switch fusion adapters to
> MQ
>
> On 29/04/2020 18:55, John Garry wrote:
> >
>
> Hi Kashyap,
>
> > ok, so it's not proper to use active hctx per request queue as
> > "users", but rather that the active request_queue's per host (which is
> > what we effectively have for nr_hw_queues = 1). Just a small comment
> > at the bottom on your change.
> >
> > So I already experimented with reducing shost.can_queue significantly
> > on hisi_sas (by a factor of 8, from 4096->512, and I use 12x SAS SSD),
> > and
> > saw:
> >
> > RFC + shost.can_queue reduced: ~1.2M IOPS With RFC + shost.can_queue
> > unmodified: ~2M IOPS Without RFC + shost.can_queue unmodified: ~2M
> > IOPS
> >
> > And with the change, below, I still get around 1.2M IOPS. But there
> > may be some sweet spot/zone where this makes a difference, which I'm
> > not visiting.
> >
> >>
> >> Combination of your patch and below resolves fairness issues. I see
> >> some better results using " --cpus_allowed_policy=split", but
> >> --cpus_allowed_policy=shared
> >
> > I did not try changing the cpus_allowed_policy policy, and so I would
> > be using default, which I believe is shared.
> >
> > is still having some issue and most likely it
> >> is to do with fairness. If fairness is not managed properly, there is
> >> high contention in wait/wakeup handling of sbitmap.
> >
> > ok, I can investigate.
>
> Could you also try this change:
>
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index
> 0a57e4f041a9..e959971b1cee 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -46,11 +46,25 @@ extern void blk_mq_tag_wakeup_all(struct
> blk_mq_tags *tags, bool);
>   void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn
> *fn,
>   		void *priv);
>
> +static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set
> +*tag_set) {
> +	return tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED; }
> +
>   static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue
> *bt,
>   						 struct blk_mq_hw_ctx *hctx)
>   {
> +	struct request_queue *queue;
> +	struct blk_mq_tag_set *set;
> +
>   	if (!hctx)
>   		return &bt->ws[0];
> +	queue = hctx->queue;
> +	set = queue->tag_set;
> +
> +	if (blk_mq_is_sbitmap_shared(set))
> +		return sbq_wait_ptr(bt, &set->wait_index);
> +
>   	return sbq_wait_ptr(bt, &hctx->wait_index);
>   }
>
> diff --git a/block/blk-mq.h b/block/blk-mq.h index
> 4f12363d56bf..241607806f77 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -158,11 +158,6 @@ struct blk_mq_alloc_data {
>   	struct blk_mq_hw_ctx *hctx;
>   };
>
> -static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set
> *tag_set)
> -{
> -	return tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED;
> -}
> -
>   static inline struct blk_mq_tags *blk_mq_tags_from_data(struct
> blk_mq_alloc_data *data)
>   {
>   	if (data->flags & BLK_MQ_REQ_INTERNAL)
>
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index
> 957cb43c5de7..427c7934c29d 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -259,6 +259,8 @@ struct blk_mq_tag_set {
>
>   	struct mutex		tag_list_lock;
>   	struct list_head	tag_list;
> +
> +	atomic_t		wait_index;
>   };

John -

I have this patch in my local repo as I was doing similar change. Your above
exact patch is not tested, but similar attempt was done. It is good to
include in next revision.
There is no significant performance up or down using this change.
Currently system is not available for me to test. I will resume testing once
system is available.

>
>   /**
>
>
> Thanks,
> John