Re: automatic testing of cgroup writeback limiting

Tejun Heo <tj@xxxxxxxxxx> · Thu, 3 Dec 2015 10:38:00 -0500

Hello, Lutz.

On Thu, Dec 03, 2015 at 01:18:48AM +0100, Lutz Vieweg wrote:
> On 12/01/2015 05:38 PM, Tejun Heo wrote:
> >As opposed to pages.  cgroup ownership is tracked per inode, not per
> >page, so if multiple cgroups write to the same inode at the same time,
> >some IOs will be incorrectly attributed.
> 
> I can't think of use cases where this could become a problem.
> If more than one user/container/VM is allowed to write to the
> same file at any one time, isolation is probably absent anyway ;-)

Yeap, that's why the trade-off was made.

> >cgroup ownership is per-inode.  IO throttling is per-device, so as
> >long as multiple filesystems map to the same device, they fall under
> >the same limit.
> 
> Good, that's why I assumed it useful to include a scenario with more
> than one filesystem on the same device into the test scenario, just
> to know whether there are unexpected issues if more than one filesystem
> utilizes the same underlying device.

Sure, I'd recommend including multiple writers on a single filesystem
case too as that exposes entanglement in metadata handling.  That
should expose problems in more places.

> I wrote of "evil" processes for simplicity, but 99 out of 100 times
> it's not intentional "evilness" that makes a process exhaust I/O
> bandwidth of some device shared with other users/containers/VMs, it's
> usually just bugs, inconsiderate programming or inappropriate use
> that makes one process write like crazy, making other
> users/containers/VMs suffer.

Right now, what cgroup writeback can control is well-behaving
workloads which aren't dominated by metadata writeback.  We still have
ways to go but it still is a huge leap compared to what we had before.

> Whereever strict service level guarantees are relevant, and
> applications require writing to storage, you currently cannot
> consolidate two or more applications onto the same physical host,
> even if they run under separate users/containers/VMs.

You're right.  It can't do isolation well enough for things like
strict service level guarantee.

> I understand there is no short or medium term solution that
> would allow to isolate processes writing to the same filesytem
> (because of the meta data writing), but is it correct to say
> that at least VMs, which do not allow the virtual guest to
> cause extensive meta data writes on the physical host, only
> writes into pre-allocated image files, can be safely isolated
> by the new "buffered write accounting"?

Sure, that or loop mounts.  Pure data accesses should be fairly well
isolated.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html