Hello, Lutz. On Thu, Dec 03, 2015 at 01:18:48AM +0100, Lutz Vieweg wrote: > On 12/01/2015 05:38 PM, Tejun Heo wrote: > >As opposed to pages. cgroup ownership is tracked per inode, not per > >page, so if multiple cgroups write to the same inode at the same time, > >some IOs will be incorrectly attributed. > > I can't think of use cases where this could become a problem. > If more than one user/container/VM is allowed to write to the > same file at any one time, isolation is probably absent anyway ;-) Yeap, that's why the trade-off was made. > >cgroup ownership is per-inode. IO throttling is per-device, so as > >long as multiple filesystems map to the same device, they fall under > >the same limit. > > Good, that's why I assumed it useful to include a scenario with more > than one filesystem on the same device into the test scenario, just > to know whether there are unexpected issues if more than one filesystem > utilizes the same underlying device. Sure, I'd recommend including multiple writers on a single filesystem case too as that exposes entanglement in metadata handling. That should expose problems in more places. > I wrote of "evil" processes for simplicity, but 99 out of 100 times > it's not intentional "evilness" that makes a process exhaust I/O > bandwidth of some device shared with other users/containers/VMs, it's > usually just bugs, inconsiderate programming or inappropriate use > that makes one process write like crazy, making other > users/containers/VMs suffer. Right now, what cgroup writeback can control is well-behaving workloads which aren't dominated by metadata writeback. We still have ways to go but it still is a huge leap compared to what we had before. > Whereever strict service level guarantees are relevant, and > applications require writing to storage, you currently cannot > consolidate two or more applications onto the same physical host, > even if they run under separate users/containers/VMs. You're right. It can't do isolation well enough for things like strict service level guarantee. > I understand there is no short or medium term solution that > would allow to isolate processes writing to the same filesytem > (because of the meta data writing), but is it correct to say > that at least VMs, which do not allow the virtual guest to > cause extensive meta data writes on the physical host, only > writes into pre-allocated image files, can be safely isolated > by the new "buffered write accounting"? Sure, that or loop mounts. Pure data accesses should be fairly well isolated. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html