On Fri, Jan 25, 2013 at 06:49:34PM +0530, Suresh Jayaraman wrote: > Hello, > > I'd like to discuss again[1] the problem of throttling buffered writes > and a throttle mechanism that works for all kinds of I/O. > > Some background information. > > During last year's LSF/MM, Fengguang discussed his proportional I/O > controller patches as part of the writeback session. The limitations > that were seen of his approach were a) non-handling of bursty IO > submission in the flusher thread b) sharing config variables among > different policies c) and that it violates layering and lacking > long-term design. Tejun proposed back-pressure approach to the problem > i.e. apply pressure where the problem is (block layer) and propagate > upwards. > > The general opinion at that time was that we needed more > inputs/consensus needed on the natural, flexible, extensible > "interface". The discussion thread that Vivek started[2] to collect the > inputs on "interface", though resulted in good collection of inputs, > not sure whether it represents inputs from all the interested parties. > > At Kernel Summit last year, I learned from LWN[3] that the topic was > discussed again. Tejun, apparently proposed a solution that splits up > the global async CFQ queue by cgroup, so that the CFQ scheduler can > easily schedule the per-cgroup sync/async queues according to the > per-cgroup I/O weights. Fengguang proposed a solution by supporting the > per-cgroup buffered write weights in balance_dirty_pages() and running a > user-space daemon that updates the CFQ/BDP weights every second. There > doesn't seem to be consensus towards either of the proposed approaches. > Moving async queues in respective cgroup is easy part. Also for throttling, you don't need CFQ. So CFQ and IO throttling are little orthogonal. (I am assuming by throttling you mean upper limiting IO). And I think tejun wanted to implement throttling at block layer and wanted vm to adjust/respond to per group IO backlog when it comes to writting to dirty data/inodes. Once we have take care of writeback problem then comes the issue of being able to associate a dirty inode/page to a cgroup. Not sure if something has happened on that front or not. In the past it was thought to be simple that one inode belongs to one IO cgroup. Also seriously, in CFQ, group idling performance penalty is too high and might start showing up easily on a single spindle sata disk also. Especially given the fact that people will come up with hybrid SATA drives with some caching internally. So SATA drive will not be as slow. So proportional group scheduling of CFQ is limited to such a specific corner case of slow SATA drive. I am not sure how many people really use it. To me, we first need to have some ideas here on how to implement low cost proportional group scheduling (either in CFQ or in other scheduler). Till then we can develop a lot of infrastructure but it usability will be very limited. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html