automatic testing of cgroup writeback limiting (was: Re: Does XFS support cgroup writeback limiting?)

Martin Steigerwald <martin@xxxxxxxxxxxx> · Tue, 01 Dec 2015 09:38:03 +0100

I think it makes sense to include those that wrote the cgroup writeback 
limiting into the discussion on how to automatically test this feature. Thus 
add fsdevel and Tejun to CC.

Am Sonntag, 29. November 2015, 22:41:13 CET schrieb Lutz Vieweg:
> On 11/25/2015 10:35 PM, Dave Chinner wrote:
> >> 2) Create 3 different XFS filesystem instances on the block
> >> 
> >>     device, one for access by only the "good" processes,
> >>     on for access by only the "evil" processes, one for
> >>     shared access by at least two "good" and two "evil"
> >>     processes.
> > 
> > Why do you need multiple filesystems? The writeback throttling is
> > designed to work within a single filesystem...
> 
> Hmm. Previously, I thought that the limiting of buffered writes
> was realized by keeping track of the owners of dirty pages, and
> that filesystem support was just required to make sure that writing
> via a filesystem did not "anonymize" the dirty data. From what
> I had read in blkio-controller.txt it seemed evident that limitations
> would be accounted for "per block device", not "per filesystem", and
> options like
> 
> > echo "<major>:<minor>  <rate_bytes_per_second>" >
> > /cgrp/blkio.throttle.read_bps_device
> document how to configure limits per block device.
> 
> Now after reading through the new Writeback section of blkio-controller.txt
> again I am somewhat confused - the text states
> 
> > writeback operates on inode basis
> 
> and if that means inodes as in "file system inodes", this would
> indeed mean limits would be enforced "per filesystem" - and yet
> there are no options documented to specify limits for any specific
> filesystem.
> 
> Does this mean some process writing to a block device (not via filesystem)
> without "O_DIRECT" will dirty buffer pages, but those will not be limited
> (as they are neither synchronous nor via-filesystem writes)?
> That would mean VMs sharing some (physical or abstract) block device could
> not really be isolated regarding their asynchronous write I/O...
> 
> > Metadata IO not throttled - it is owned by the filesystem and hence
> > root cgroup.
> 
> Ouch. That kind of defeats the purpose of limiting evil processes'
> ability to DOS other processes.
> Wouldn't it be possible to assign some arbitrary cost to meta-data
> operations - like "account one page write for each meta-data change
> to the originating process of that change"? While certainly not
> allowing for limiting to byte-precise limits of write bandwidth,
> this would regain the ability to defend against DOS situations,
> and for well-behaved processes, the "cost" accounted for their
> not-so-frequent meta-data operations would probably not really hurt their
> writing
> speed.
> 
> >>  The test is successful if all "good processes" terminate successfully
> >>  
> >>    after a time not longer than it would take to write 10 times X MB to
> >>    the
> >>    rate-limited block device.
> > 
> > if we are rate limiting to 1MB/s, then a 10s test is not long enough
> > to reach steady state. Indeed, it's going to take at least 30s worth
> > of IO to guarantee that we getting writeback occurring for low
> > bandwidth streams....
> 
> Sure, the "X/100 MB per second" throttle to the scratch device
> was meant to result in a minimal test time of > 100s.
> 
> > i.e. the test needs to run for a period of time and then measure
> > the throughput of each stream, comparing it against the expected
> > throughput for the stream, rather than trying to write a fixed
> > bandwidth....
> 
> The reason why I thought it to be a good idea to have the "good" processes
> use only a limited write rate was to make sure that the actual write
> activity of those processes is spread out over enough time to make sure
> that they could, after all, feel some "pressure back" from the operating
> system that is applied only after the "bad" processes have filled up
> all RAM dedicated to dirty buffer cache.
> 
> Assume the test instance has lots of memory and would be willing to
> spend many Gigabytes of RAM for dirty buffer caches. Chances are that
> in such a situation the "good" processes might be done writing their
> limited amount of data almost instantaneously, because the data just
> went to RAM.
> 
> (I understand that if one used the absolute "blkio.throttle.write*" options
> pressure back could apply before the dirty buffer cache was maxed out,
> but in real-world scenarios people will almost always use the relative
> "blkio.weight" based limiting, after all, you usually don't want to throttle
> processes if there is plenty of bandwidth left no other process wants at
> the same time.)
> 
> 
> Regards,
> 
> Lutz Vieweg
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs