Unless the maximum budget B_max that BFQ can assign to a queue is set explicitly by the user, BFQ automatically updates B_max. In particular, BFQ dynamically sets B_max to the number of sectors that can be read, at the current estimated peak rate, during the maximum time, T_max, allowed before a budget timeout occurs. In formulas, if we denote as R_est the estimated peak rate, then B_max = T_max ∗ R_est. Hence, the higher R_est is with respect to the actual device peak rate, the higher the probability that processes incur budget timeouts unjustly is. Besides, a too high value of B_max unnecessarily increases the deviation from an ideal, smooth service. To filter out spikes, the estimated peak rate is updated only on the expiration of queues that have been served for a long-enough time. As a first step, the estimator computes the device rate, R_meas, during the service of the queue. After that, if R_est < R_meas, then R_est is set to R_meas. Unfortunately, our experiments highlighted the following two problems. First, because of ZBR, depending on the locality of the workload, the estimator may easily converge to a value that is appropriate only for part of a disk. Second, R_est may jump (and remain forever equal) to a much higher value than the actual device peak rate, in case of hits in the drive cache, which may let sectors be transferred in practice at bus rate. To try to converge to the actual average peak rate over the disk surface (in case of rotational devices), and to smooth the spikes caused by the drive cache, this patch changes the estimator as follows. In the description of the changes, we refer to a queue containing random requests as 'seeky', according to the terminology used in the code, and inherited from CFQ. - First, now R_est may be updated also in case the just-expired queue, despite not being detected as seeky, has not been however able to consume all of its budget within the maximum time slice T_max. In fact, this is an indication that B_max is too large. Since B_max = T_max ∗ R_est, R_est is then probably too large, and should be reduced. - Second, to filter the spikes in R_meas, a discrete low-pass filter is now used to update R_est instead of just keeping the highest rate sampled. The rationale is that the average peak rate of a disk over its surface is a relatively stable quantity, hence a low-pass filter should converge more or less quickly to the right value. With the current values of the constants used in the filter, the latter seems to effectively smooth fluctuations and allow the estimator to converge to the actual peak rate with all the devices we tested. Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx> Signed-off-by: Arianna Avanzini <avanzini.arianna@xxxxxxxxx> --- block/cfq-iosched.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index de914a3..9792cd7 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3038,7 +3038,7 @@ static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout) * throughput. See the code for more details. */ static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq, - bool compensate) + bool compensate, enum bfqq_expiration reason) { u64 bw, usecs, expected, timeout; ktime_t delta; @@ -3074,10 +3074,23 @@ static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq, * the peak rate estimation. */ if (usecs > 20000) { - if (bw > bfqd->peak_rate) { - bfqd->peak_rate = bw; + if (bw > bfqd->peak_rate || + (!BFQQ_SEEKY(bfqq) && + reason == BFQ_BFQQ_BUDGET_TIMEOUT)) { + bfq_log(bfqd, "measured bw =%llu", bw); + /* + * To smooth oscillations use a low-pass filter with + * alpha=7/8, i.e., + * new_rate = (7/8) * old_rate + (1/8) * bw + */ + do_div(bw, 8); + if (bw == 0) + return 0; + bfqd->peak_rate *= 7; + do_div(bfqd->peak_rate, 8); + bfqd->peak_rate += bw; update = 1; - bfq_log(bfqd, "new peak_rate=%llu", bw); + bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate); } update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1; @@ -3157,7 +3170,7 @@ static void bfq_bfqq_expire(struct bfq_data *bfqd, * Update disk peak rate for autotuning and check whether the * process is slow (see bfq_update_peak_rate). */ - slow = bfq_update_peak_rate(bfqd, bfqq, compensate); + slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason); /* * As above explained, 'punish' slow (i.e., seeky), timed-out -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html