Re: IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF)

Chad Talbott <ctalbott@xxxxxxxxxx> · Wed, 30 Mar 2011 15:49:17 -0700

On Wed, Mar 30, 2011 at 3:20 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Mar 30, 2011 at 11:37:57AM -0400, Vivek Goyal wrote:
>> We are planning to track the IO context of original submitter of IO
>> by storing that information in page_cgroup. So that is not the problem.
>>
>> The problem google guys are trying to raise is that can a single flusher
>> thread keep all the groups on bdi busy in such a way so that higher
>> prio group can get more IO done.
>
> Which has nothing to do with IO-less dirty throttling at all!

Not quite.  Pre IO-less dirty throttling, any thread which was
dirtying did the writeback itself.  Because there's no shortage of
threads to do the work, the IO scheduler sees a bunch of threads doing
writes against a given BDI and schedules them against each other.
This is how async IO isolation works for us.

>> It should not happen that flusher
>> thread gets blocked somewhere (trying to get request descriptors on
>> request queue)
>
> A major design principle of the bdi-flusher threads is that they
> are supposed to block when the request queue gets full - that's how
> we got rid of all the congestion garbage from the writeback
> stack.

With IO cgroups and async write isolation, there are multiple queues
per disk that all need to be filled to allow cgroup-aware CFQ schedule
between them.  If the per-BDI threads could be taught to fill each
per-cgroup queue before giving up on a BDI, then IO-less throttling
could work.  Also, having per-(BDI, blkio cgroup)-flusher threads
would work.  I think it's complicated enough to warrant a discussion.

> There are plans to move the bdi-flusher threads to work queues, and
> once that is done all your concerns about blocking and parallelism
> are pretty much gone because it's trivial to have multiple writeback
> works in progress at once on the same bdi with that infrastructure.

This sounds promising.

>> So the concern they raised that is single flusher thread per device
>> is enough to keep faster cgroup full at the bdi and hence get the
>> service differentiation.
>
> I think there's much bigger problems than that.

We seem to be agreeing that it's a complicated problem.  That's why I
think async write isolation needs some design-level discussion.

Chad
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html