On 06/06/2017 04:11 PM, Shaohua Li wrote: > On Tue, Jun 06, 2017 at 03:12:12PM -0600, Jens Axboe wrote: >> On 06/06/2017 01:40 PM, Shaohua Li wrote: >>> From: Shaohua Li <shli@xxxxxx> >>> >>> hard disk IO latency varies a lot depending on spindle move. The latency >>> range could be from several microseconds to several milliseconds. It's >>> pretty hard to get the baseline latency used by io.low. >>> >>> We will use a different stragety here. The idea is only using IO with >>> spindle move to determine if cgroup IO is in good state. For HD, if io >>> latency is small (< 1ms), we ignore the IO. Such IO is likely from >>> sequential IO, and is helpless to help determine if a cgroup's IO is >>> impacted by other cgroups. With this, we only account IO with big >>> latency. Then we can choose a hardcoded baseline latency for HD (4ms, >>> which is typical IO latency with seek). With all these settings, the >>> io.low latency works for both HD and SSD. >> >> I think that makes sense for reads - a quick read is certainly a cache >> hit, due to a sequential IO hit. But what about for writes? Most are >> absorbed by the write cache, so most will be < 1ms probably. Let's say >> an app is doing random writes, the one write we do account will >> potentially come at a much higher cost than the actual cost of that one >> write. Simiarly for sequential writes, but the ratio of costs is closer >> there. >> >> Maybe that's OK? > > Good question. So the concern is say we have 10 io, without cache, the 10 io > will have large latency; but with cache, only the 10th io has a very large > latency, so we miss accounting some IOs. Precisely! > I don't have a good solution for this. On the other hand, this > wouldn't occur in workload with sustained IO, so this problem sounds > not a big deal. We probably can add a check later if collected sample > count isn't enough, ignore the samples, which I think can mitigate > this problem partly. I'm almost wondering if it doesn't just make sense to have fixed costs for a single spindle. But it probably doesn't make sense to special case that, and I think you are right in that it works itself out over time with sustained IO. I'll add your patch for 4.12. -- Jens Axboe