Re: [LSF/MM TOPIC] [ATTEND] Throttling I/O

Vivek Goyal <vgoyal@xxxxxxxxxx> · Fri, 25 Jan 2013 11:34:08 -0500

On Fri, Jan 25, 2013 at 06:49:34PM +0530, Suresh Jayaraman wrote:
> Hello,
> 
> I'd like to discuss again[1] the problem of throttling buffered writes
> and a throttle mechanism that works for all kinds of I/O.
> 
> Some background information.
> 
> During last year's LSF/MM, Fengguang discussed his proportional I/O
> controller patches as part of the writeback session. The limitations
> that were seen of his approach were a) non-handling of bursty IO
> submission in the flusher thread b) sharing config variables among
> different policies c) and that it violates layering and lacking
> long-term design. Tejun proposed back-pressure approach to the problem
> i.e. apply pressure where the problem is (block layer) and propagate
> upwards.
> 
> The general opinion at that time was that we needed more
> inputs/consensus needed on the natural, flexible, extensible
> "interface". The discussion thread that Vivek started[2] to collect the
> inputs on "interface", though resulted in good collection of inputs,
> not sure whether it represents inputs from all the interested parties.
> 
> At Kernel Summit last year, I learned from LWN[3] that the topic was
> discussed again. Tejun, apparently proposed a solution that splits up
> the global async CFQ queue by cgroup, so that the CFQ scheduler can
> easily schedule the per-cgroup sync/async queues according to the
> per-cgroup I/O weights. Fengguang proposed a solution by supporting the
> per-cgroup buffered write weights in balance_dirty_pages() and running a
> user-space daemon that updates the CFQ/BDP weights every second. There
> doesn't seem to be consensus towards either of the proposed approaches.
> 

Moving async queues in respective cgroup is easy part. Also for
throttling, you don't need CFQ. So CFQ and IO throttling are little
orthogonal. (I am assuming by throttling you mean upper limiting IO).

And I think tejun wanted to implement throttling at block layer and
wanted vm to adjust/respond to per group IO backlog when it comes
to writting to dirty data/inodes.

Once we have take care of writeback problem then comes the issue
of being able to associate a dirty inode/page to a cgroup. Not sure
if something has happened on that front or not. In the past it was
thought to be simple that one inode belongs to one IO cgroup.

Also seriously, in CFQ, group idling performance penalty is too
high and might start showing up easily on a single spindle sata disk
also. Especially given the fact that people will come up with hybrid
SATA drives with some caching internally. So SATA drive will not
be as slow.

So proportional group scheduling of CFQ is limited to such a specific
corner case of slow SATA drive. I am not sure how many people really
use it.

To me, we first need to have some ideas here on how to implement low
cost proportional group scheduling (either in CFQ or in other scheduler).
Till then we can develop a lot of infrastructure but it usability will
be very limited.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html