On Tue, Jun 06, 2017 at 03:12:12PM -0600, Jens Axboe wrote: > On 06/06/2017 01:40 PM, Shaohua Li wrote: > > From: Shaohua Li <shli@xxxxxx> > > > > hard disk IO latency varies a lot depending on spindle move. The latency > > range could be from several microseconds to several milliseconds. It's > > pretty hard to get the baseline latency used by io.low. > > > > We will use a different stragety here. The idea is only using IO with > > spindle move to determine if cgroup IO is in good state. For HD, if io > > latency is small (< 1ms), we ignore the IO. Such IO is likely from > > sequential IO, and is helpless to help determine if a cgroup's IO is > > impacted by other cgroups. With this, we only account IO with big > > latency. Then we can choose a hardcoded baseline latency for HD (4ms, > > which is typical IO latency with seek). With all these settings, the > > io.low latency works for both HD and SSD. > > I think that makes sense for reads - a quick read is certainly a cache > hit, due to a sequential IO hit. But what about for writes? Most are > absorbed by the write cache, so most will be < 1ms probably. Let's say > an app is doing random writes, the one write we do account will > potentially come at a much higher cost than the actual cost of that one > write. Simiarly for sequential writes, but the ratio of costs is closer > there. > > Maybe that's OK? Good question. So the concern is say we have 10 io, without cache, the 10 io will have large latency; but with cache, only the 10th io has a very large latency, so we miss accounting some IOs. I don't have a good solution for this. On the other hand, this wouldn't occur in workload with sustained IO, so this problem sounds not a big deal. We probably can add a check later if collected sample count isn't enough, ignore the samples, which I think can mitigate this problem partly. Thanks, Shaohua