On Mon, Nov 28, 2016 at 05:21:48PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > On Wed, Nov 23, 2016 at 05:15:18PM -0800, Shaohua Li wrote: > > > Hmm... I'm not sure thinktime is the best measure here. Think time is > > > used by cfq mainly to tell the likely future behavior of a workload so > > > that cfq can take speculative actions on the prediction. However, > > > given that the implemented high limit behavior tries to provide a > > > certain level of latency target, using the predictive thinktime to > > > regulate behavior might lead to too unpredictable behaviors. > > > > Latency just reflects one side of the IO. Latency and think time haven't any > > relationship. For example, a cgroup dispatching 1 IO per second can still have > > high latency. If we only take latency account, we will think the cgroup is > > busy, which is not justified. > > Yes, the two are indepndent metrics; however, whether a cgroup is > considered idle or not affects whether blk-throttle will adhere to the > latency target or not. Thinktime is a magic number which can be good > but whose behavior can be very difficult to predict from outside the > black box. What I was trying to say was that putting in thinktime > here can greatly weaken the configured latency target in unobvious > ways. > > > > Moreover, I don't see why we need to bother with predictions anyway. > > > cfq needed it but I don't think that's the case for blk-throtl. It > > > can just provide idle threshold where a cgroup which hasn't issued an > > > IO over that threshold is considered idle. That'd be a lot easier to > > > understand and configure from userland while providing a good enough > > > mechanism to prevent idle cgroups from clamping down utilization for > > > too long. > > > > We could do this, but it will only work for very idle workload, eg, the > > workload is completely idle. If workload dispatches IO sporadically, this will > > likely not work. The average think time is more precise for predication. > > But we can increase sharing by upping the target latency. That should > be the main knob - if low, the user wants stricter service guarantee > at the cost of lower overall utilization; if high, the workload can > deal with higher latency and the system can achieve higher overall > utilization. I think the idle detection should be an extra mechanism > which can be used to ignore cgroup-disk combinations which are staying > idle for a long time. Yes, we can increase target latency to increase sharing. But latency and think time are different. In the example I mentioned earlier, we must increase the latency target very big to increase sharing even the cgroup just sends 1 IO per second. Don't think this's what users want. In a summary, we can't only use latency to determine if cgroups could dispatch more IO. Currently the think time idle detection is an extra mechanism to ignore cgroup limit. So we currently we only ignore cgroup limit when think time is big or latency is small. This does make the behavior a little bit difficult to predict, eg, not respect latency target sometimes, but this is necessary to have better sharing. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html