Hello, On Tue, Dec 01, 2015 at 09:38:03AM +0100, Martin Steigerwald wrote: > > > echo "<major>:<minor> <rate_bytes_per_second>" > > > > /cgrp/blkio.throttle.read_bps_device > > document how to configure limits per block device. > > > > Now after reading through the new Writeback section of blkio-controller.txt > > again I am somewhat confused - the text states > > > > > writeback operates on inode basis As opposed to pages. cgroup ownership is tracked per inode, not per page, so if multiple cgroups write to the same inode at the same time, some IOs will be incorrectly attributed. > > and if that means inodes as in "file system inodes", this would > > indeed mean limits would be enforced "per filesystem" - and yet > > there are no options documented to specify limits for any specific > > filesystem. cgroup ownership is per-inode. IO throttling is per-device, so as long as multiple filesystems map to the same device, they fall under the same limit. > > > Metadata IO not throttled - it is owned by the filesystem and hence > > > root cgroup. > > > > Ouch. That kind of defeats the purpose of limiting evil processes' > > ability to DOS other processes. cgroup isn't a security mechanism and has to make active tradeoffs between isolation and overhead. It doesn't provide protection against malicious users and in general it's a pretty bad idea to depend on cgroup for protection against hostile entities. Although some controller do better isolation than others, given how filesystems are implemented, filesystem io control getting there will likely take a while. > > Wouldn't it be possible to assign some arbitrary cost to meta-data > > operations - like "account one page write for each meta-data change > > to the originating process of that change"? While certainly not > > allowing for limiting to byte-precise limits of write bandwidth, > > this would regain the ability to defend against DOS situations, > > and for well-behaved processes, the "cost" accounted for their > > not-so-frequent meta-data operations would probably not really hurt their > > writing > > speed. For aggregate consumers, this sort of approaches does make sense - measure total consumption by common operations and distribute the charges afterwards; however, this will require quite a bit of work on both io controller and filesystem sides. > > (I understand that if one used the absolute "blkio.throttle.write*" options > > pressure back could apply before the dirty buffer cache was maxed out, > > but in real-world scenarios people will almost always use the relative > > "blkio.weight" based limiting, after all, you usually don't want to throttle > > processes if there is plenty of bandwidth left no other process wants at > > the same time.) I'd recommend configuring both memory.high and io.weight so that the buffer area isn't crazy high compared to io bandwidth. It should be able to reach the configured ratio that way and also avoids two io domains competing in the same io domain which can skew the results. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html