Re: [PATCH 4/6] blk-mq: use EWMA to estimate congestion threshold

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 13 Jul 2017 18:43:42 +0800

On Wed, Jul 12, 2017 at 03:39:14PM +0000, Bart Van Assche wrote:
> On Wed, 2017-07-12 at 10:30 +0800, Ming Lei wrote:
> > On Tue, Jul 11, 2017 at 12:25:16PM -0600, Jens Axboe wrote:
> > > What happens with fluid congestion boundaries, with shared tags?
> > 
> > The approach in this patch should work, but the threshold may not
> > be accurate in this way, one simple method is to use the average
> > tag weight in EWMA, like this:
> > 
> > 	sbitmap_weight() / hctx->tags->active_queues
> 
> Hello Ming,
> 
> That approach would result in a severe performance degradation. "active_queues"
> namely represents the number of queues against which I/O ever has been queued.
> If e.g. 64 LUNs would be associated with a single SCSI host and all 64 LUNs are
> responding and if the queue depth would also be 64 then the approach you
> proposed will reduce the effective queue depth per LUN from 64 to 1.

No, this approach does _not_ reduce the effective queue depth, it only
stops the queue for a while when the queue is busy enough.

In this case, there may not have congestion because for blk-mq at most allows
to assign queue_depth/active_queues tags to each LUN, please see hctx_may_queue().
Then get_driver_tag() can only allow to return one pending tag at most to the
request_queue(LUN).

The algorithm in this patch only starts to work when congestion happens,
that said it is only run when BLK_STS_RESOURCE is returned from .queue_rq().
This approach is for avoiding to dispatch requests to one busy queue
unnecessarily, so that we don't need to heat CPU unnecessarily, and
merge gets improved meantime.

-- 
Ming