On Tue, Apr 03, 2012 at 01:00:14AM -0700, Fengguang Wu wrote: [CC Jens] [..] > > I think blkio.weight can be thought of a system wide weight of a cgroup > > and more than one entity/subsystem should be able to make use of it and > > differentiate between IO in its own way. CFQ can decide to do proportional > > time division, and buffered write controller should be able to use the > > same weight and do write bandwidth differentiation. I think it is better > > than introducing another buffered write controller tunable for weight. > > > > Personally, I am not too worried about this point. We can document and > > explain it well. > > Agreed. The throttling may work in *either* bps, IOPS or disk time > modes. In each mode blkio.weight is naturally tied to the > corresponding IO metrics. Well, Tejun does not like the idea of sharing config variables among different policies. So I guess you shall have to come up with your own configurations variables as desired. As each policy will have its own configuration and stats, prefixing the vairable/stat name with policy name will help identify it. Not sure what's a good name for buffered write policy. May be blkio.dirty.weight blkio.dirty.bps blkio.buffered_write.* or blkio.buf_write* or blkio.dirty_rate.* or [..] > > Patch 6/6 shows simple test results for bps based throttling. > > Since then I've improved the patches to work in a more "contained" way > when blkio.throttle.buffered_write_bps is not set. > > The old behavior is, if blkcg A contains 1 dd and blkcg B contains 10 > dd tasks and they have equal weight, B will get 10 times bandwidth > than A. > > With the below updated core bits, A and B will get equal share of > write bandwidth. The basic idea is to use Yes, this new behavior makes more sense. Two equal weight groups get equal bandwidth irrpesctive of number of tasks in cgroup. [..] > Test results are "pretty good looking" :-) The attached graphs > illustrates nice attributes of accuracy, fairness and smoothness > for the following tests. Indeed. These results are pretty cool. It is hard to belive that lines are so smooth and lines for two tasks are overlapping each other such that it is not obivious initially that they are overlapping and dirtying equal amount of memory. I had to take a second look to figure that out. Just that results for third graph (weight 500 and 1000 respectively) are not perfect. I think Ideally all the 3 tasks should have dirtied same amount of memory. But I think achieving perfection here might not be easy and may be not many people will care. Given the fact that you are doing a reasonable job of providing service differentiation between buffered writers, I am wondering if you should look at the ioprio of writers with-in cgroup and provide service differentiation among those too. CFQ has separate queues but it loses the context information by the time IO is submitted. So you might be able to do a much better job. Anyway, this is a possible future enhancement and not necessarily related to this patchset. Also, we are controlling the rate of dirtying the memory. I am again wondering whether these configuration knobs should be part of memory controller and not block controller. Think of NFS case. There is no block device or block layer involved but we will control the rate of dirtying memory. So some control in memory controller might make sense. And following kind of knobs might make sense there. memcg.dirty_weight or memcg.dirty.weight memcg.dirty_bps or memcg.dirty.write_bps Just that we control not the *absolute amount* of memory but *rate* of writing to memory and I think that makes it somewhat confusing and gives the impression that it should be part of block IO controller. I am kind of split on this (rather little inclined towards memory controller), so I am raising the question and others can weigh in with their thoughts on what makes more sense here. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html