Re: automatic testing of cgroup writeback limiting (was: Re: Does XFS support cgroup writeback limiting?)

Tejun Heo <tj@xxxxxxxxxx> · Tue, 1 Dec 2015 11:38:15 -0500

Hello,

On Tue, Dec 01, 2015 at 09:38:03AM +0100, Martin Steigerwald wrote:
> > > echo "<major>:<minor>  <rate_bytes_per_second>" >
> > > /cgrp/blkio.throttle.read_bps_device
> > document how to configure limits per block device.
> > 
> > Now after reading through the new Writeback section of blkio-controller.txt
> > again I am somewhat confused - the text states
> > 
> > > writeback operates on inode basis

As opposed to pages.  cgroup ownership is tracked per inode, not per
page, so if multiple cgroups write to the same inode at the same time,
some IOs will be incorrectly attributed.

> > and if that means inodes as in "file system inodes", this would
> > indeed mean limits would be enforced "per filesystem" - and yet
> > there are no options documented to specify limits for any specific
> > filesystem.

cgroup ownership is per-inode.  IO throttling is per-device, so as
long as multiple filesystems map to the same device, they fall under
the same limit.

> > > Metadata IO not throttled - it is owned by the filesystem and hence
> > > root cgroup.
> > 
> > Ouch. That kind of defeats the purpose of limiting evil processes'
> > ability to DOS other processes.

cgroup isn't a security mechanism and has to make active tradeoffs
between isolation and overhead.  It doesn't provide protection against
malicious users and in general it's a pretty bad idea to depend on
cgroup for protection against hostile entities.  Although some
controller do better isolation than others, given how filesystems are
implemented, filesystem io control getting there will likely take a
while.

> > Wouldn't it be possible to assign some arbitrary cost to meta-data
> > operations - like "account one page write for each meta-data change
> > to the originating process of that change"? While certainly not
> > allowing for limiting to byte-precise limits of write bandwidth,
> > this would regain the ability to defend against DOS situations,
> > and for well-behaved processes, the "cost" accounted for their
> > not-so-frequent meta-data operations would probably not really hurt their
> > writing
> > speed.

For aggregate consumers, this sort of approaches does make sense -
measure total consumption by common operations and distribute the
charges afterwards; however, this will require quite a bit of work on
both io controller and filesystem sides.

> > (I understand that if one used the absolute "blkio.throttle.write*" options
> > pressure back could apply before the dirty buffer cache was maxed out,
> > but in real-world scenarios people will almost always use the relative
> > "blkio.weight" based limiting, after all, you usually don't want to throttle
> > processes if there is plenty of bandwidth left no other process wants at
> > the same time.)

I'd recommend configuring both memory.high and io.weight so that the
buffer area isn't crazy high compared to io bandwidth.  It should be
able to reach the configured ratio that way and also avoids two io
domains competing in the same io domain which can skew the results.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html