Hello, On Thu, Dec 15, 2016 at 12:33:07PM -0800, Shaohua Li wrote: > User configures latency target, but the latency threshold for each > request size isn't fixed. For a SSD, the IO latency highly depends on > request size. To calculate latency threshold, we sample some data, eg, > average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency > threshold of each request size will be the sample latency (I'll call it > base latency) plus latency target. For example, the base latency for > request size 4k is 80us and user configures latency target 60us. The 4k > latency threshold will be 80 + 60 = 140us. Ah okay, the user configures the extra latency. Yeah, this is way better than treating what the user configures as the target latency for 4k IOs. > @@ -25,6 +25,8 @@ static int throtl_quantum = 32; > #define DFL_IDLE_THRESHOLD_HD (1000 * 1000) /* 1 ms */ > #define MAX_IDLE_TIME (500L * 1000 * 1000) /* 500 ms */ > > +#define SKIP_TRACK (((u64)1) << BLK_STAT_RES_SHIFT) SKIP_LATENCY? > +static void throtl_update_latency_buckets(struct throtl_data *td) > +{ > + struct avg_latency_bucket avg_latency[LATENCY_BUCKET_SIZE]; > + int i, cpu; > + u64 last_latency = 0; > + u64 latency; > + > + if (!blk_queue_nonrot(td->queue)) > + return; > + if (time_before(jiffies, td->last_calculate_time + HZ)) > + return; > + td->last_calculate_time = jiffies; > + > + memset(avg_latency, 0, sizeof(avg_latency)); > + for (i = 0; i < LATENCY_BUCKET_SIZE; i++) { > + struct latency_bucket *tmp = &td->tmp_buckets[i]; > + > + for_each_possible_cpu(cpu) { > + struct latency_bucket *bucket; > + > + /* this isn't race free, but ok in practice */ > + bucket = per_cpu_ptr(td->latency_buckets, cpu); > + tmp->total_latency += bucket[i].total_latency; > + tmp->samples += bucket[i].samples; Heh, this *can* lead to surprising results (like reading zero for a value larger than 2^32) on 32bit machines due to split updates, and if we're using nanosecs, those surprises have a chance, albeit low, of happening every four secs, which is a bit unsettling. If we have to use nanosecs, let's please use u64_stats_sync. If we're okay with microsecs, ulongs should be fine. > void blk_throtl_bio_endio(struct bio *bio) > { > struct throtl_grp *tg; > + u64 finish_time; > + u64 start_time; > + u64 lat; > > tg = bio->bi_cg_private; > if (!tg) > return; > bio->bi_cg_private = NULL; > > - tg->last_finish_time = ktime_get_ns(); > + finish_time = ktime_get_ns(); > + tg->last_finish_time = finish_time; > + > + start_time = blk_stat_time(&bio->bi_issue_stat); > + finish_time = __blk_stat_time(finish_time); > + if (start_time && finish_time > start_time && > + tg->td->track_bio_latency == 1 && > + !(bio->bi_issue_stat.stat & SKIP_TRACK)) { Heh, can't we collapse some of the conditions? e.g. flip SKIP_TRACK to TRACK_LATENCY and set it iff the td has track_bio_latency set and also the bio has start time set? > @@ -2106,6 +2251,12 @@ int blk_throtl_init(struct request_queue *q) > td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); > if (!td) > return -ENOMEM; > + td->latency_buckets = __alloc_percpu(sizeof(struct latency_bucket) * > + LATENCY_BUCKET_SIZE, __alignof__(u64)); > + if (!td->latency_buckets) { > + kfree(td); > + return -ENOMEM; > + } > > INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn); > throtl_service_queue_init(&td->service_queue); > @@ -2119,10 +2270,13 @@ int blk_throtl_init(struct request_queue *q) > td->low_upgrade_time = jiffies; > td->low_downgrade_time = jiffies; > > + td->track_bio_latency = UINT_MAX; I don't think using 0, 1, UINT_MAX as enums is good for readability. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html