Re: [PATCH 0/6] buffered write IO controller in balance_dirty_pages()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 03, 2012 at 01:00:14AM -0700, Fengguang Wu wrote:

[CC Jens]

[..]
> > I think blkio.weight can be thought of a system wide weight of a cgroup
> > and more than one entity/subsystem should be able to make use of it and
> > differentiate between IO in its own way. CFQ can decide to do proportional
> > time division, and buffered write controller should be able to use the
> > same weight and do write bandwidth differentiation. I think it is better
> > than introducing another buffered write controller tunable for weight.
> > 
> > Personally, I am not too worried about this point. We can document and
> > explain it well.
> 
> Agreed. The throttling may work in *either* bps, IOPS or disk time
> modes. In each mode blkio.weight is naturally tied to the
> corresponding IO metrics.

Well, Tejun does not like the idea of sharing config variables among
different policies. So I guess you shall have to come up with your
own configurations variables as desired. As each policy will have its
own configuration and stats, prefixing the vairable/stat name with
policy name will help identify it. Not sure what's a good name for
buffered write policy.

May be

blkio.dirty.weight
blkio.dirty.bps
blkio.buffered_write.* or
blkio.buf_write* or
blkio.dirty_rate.* or

[..]
> 
> Patch 6/6 shows simple test results for bps based throttling.
> 
> Since then I've improved the patches to work in a more "contained" way
> when blkio.throttle.buffered_write_bps is not set.
> 
> The old behavior is, if blkcg A contains 1 dd and blkcg B contains 10
> dd tasks and they have equal weight, B will get 10 times bandwidth
> than A.
> 
> With the below updated core bits, A and B will get equal share of
> write bandwidth. The basic idea is to use

Yes, this new behavior makes more sense. Two equal weight groups get
equal bandwidth irrpesctive of number of tasks in cgroup.

[..]
> Test results are "pretty good looking" :-) The attached graphs
> illustrates nice attributes of accuracy, fairness and smoothness
> for the following tests.

Indeed. These results are pretty cool. It is hard to belive that lines
are so smooth and lines for two tasks are overlapping each other such 
that it is not obivious initially that they are overlapping and dirtying
equal amount of memory. I had to take a second look to figure that out.

Just that results for third graph (weight 500 and 1000 respectively) are
not perfect. I think Ideally all the 3 tasks should have dirtied same
amount of memory. But I think achieving perfection here might not be
easy and may be not many people will care.

Given the fact that you are doing a reasonable job of providing service
differentiation between buffered writers, I am wondering if you should
look at the ioprio of writers with-in cgroup and provide service
differentiation among those too. CFQ has separate queues but it loses
the context information by the time IO is submitted. So you might be
able to do a much better job. Anyway, this is a possible future
enhancement and not necessarily related to this patchset.

Also, we are controlling the rate of dirtying the memory. I am again 
wondering whether these configuration knobs should be part of memory
controller and not block controller. Think of NFS case. There is no
block device or block layer involved but we will control the rate of
dirtying memory. So some control in memory controller might make
sense. And following kind of knobs might make sense there.

memcg.dirty_weight or memcg.dirty.weight
memcg.dirty_bps or memcg.dirty.write_bps

Just that we control not the *absolute amount* of memory but *rate* of
writing to memory and I think that makes it somewhat confusing and
gives the impression that it should be part of block IO controller.

I am kind of split on this (rather little inclined towards memory
controller), so I am raising the question and others can weigh in with
their thoughts on what makes more sense here.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux