Re: [PATCH 7/8] blk-wbt: add general throttling mechanism

Jan Kara <jack@xxxxxxx> · Tue, 8 Nov 2016 14:39:30 +0100

On Tue 01-11-16 15:08:50, Jens Axboe wrote:
> We can hook this up to the block layer, to help throttle buffered
> writes.
> 
> wbt registers a few trace points that can be used to track what is
> happening in the system:
> 
> wbt_lat: 259:0: latency 2446318
> wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
>                wmean=518866, wmin=15522, wmax=5330353, wsamples=57
> wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32
> 
> This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
> dumps the current read/write stats for that window, and wbt_step shows a
> step down event where we now scale back writes. Each trace includes the
> device, 259:0 in this case.

Just one serious question and one nit below:

> +void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct)
> +{
> +	struct rq_wait *rqw;
> +	int inflight, limit;
> +
> +	if (!(wb_acct & WBT_TRACKED))
> +		return;
> +
> +	rqw = get_rq_wait(rwb, wb_acct & WBT_KSWAPD);
> +	inflight = atomic_dec_return(&rqw->inflight);
> +
> +	/*
> +	 * wbt got disabled with IO in flight. Wake up any potential
> +	 * waiters, we don't have to do more than that.
> +	 */
> +	if (unlikely(!rwb_enabled(rwb))) {
> +		rwb_wake_all(rwb);
> +		return;
> +	}
> +
> +	/*
> +	 * If the device does write back caching, drop further down
> +	 * before we wake people up.
> +	 */
> +	if (rwb->wc && !wb_recent_wait(rwb))
> +		limit = 0;
> +	else
> +		limit = rwb->wb_normal;

So for devices with write cache, you will completely drain the device
before waking anybody waiting to issue new requests. Isn't it too strict?
In particular may_queue() will allow new writers to issue new writes once
we drop below the limit so it can happen that some processes will be
effectively starved waiting in may_queue?

> +static void wb_timer_fn(unsigned long data)
> +{
> +	struct rq_wb *rwb = (struct rq_wb *) data;
> +	unsigned int inflight = wbt_inflight(rwb);
> +	int status;
> +
> +	status = latency_exceeded(rwb);
> +
> +	trace_wbt_timer(rwb->bdi, status, rwb->scale_step, inflight);
> +
> +	/*
> +	 * If we exceeded the latency target, step down. If we did not,
> +	 * step one level up. If we don't know enough to say either exceeded
> +	 * or ok, then don't do anything.
> +	 */
> +	switch (status) {
> +	case LAT_EXCEEDED:
> +		scale_down(rwb, true);
> +		break;
> +	case LAT_OK:
> +		scale_up(rwb);
> +		break;
> +	case LAT_UNKNOWN_WRITES:
> +		scale_up(rwb);
> +		break;
> +	case LAT_UNKNOWN:
> +		if (++rwb->unknown_cnt < RWB_UNKNOWN_BUMP)
> +			break;
> +		/*
> +		 * We get here for two reasons:
> +		 *
> +		 * 1) We previously scaled reduced depth, and we currently
> +		 *    don't have a valid read/write sample. For that case,
> +		 *    slowly return to center state (step == 0).
> +		 * 2) We started a the center step, but don't have a valid
> +		 *    read/write sample, but we do have writes going on.
> +		 *    Allow step to go negative, to increase write perf.
> +		 */

I think part 2) of the comment now belongs to LAT_UNKNOWN_WRITES label.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html