Re: [LSF/FS TOPIC] I/O performance isolation for shared storage

Chad Talbott <ctalbott@xxxxxxxxxx> · Tue, 15 Feb 2011 15:15:52 -0800

On Tue, Feb 15, 2011 at 4:54 AM, Jan Kara <jack@xxxxxxx> wrote:
> On Fri 04-02-11 15:07:15, Chad Talbott wrote:
>> Per-cgroup dirty ratios is just the beginning, as you mention.  Unless
>> the IO scheduler can see the deep queues of all the blocked tasks, it
>> can't make the right decisions.  Also, today writeback is ignorant of
>> the tasks' debt to the IO scheduler, so it issues the "wrong" inodes.
>  I'm curious: Could you elaborate a bit more about this? I'm not sure what
> a debt to the IO scheduler is and why choice of inodes would matter...

Sorry, this comment needs more context.  Google's servers typically
operate with both memory capacity and disk time isolation via cgroups.
 We maintain a set of patches that provide page tracking and buffered
write isolation.  We are working on sending those patches out
alongside the memcg efforts.

A scenario: When a given cgroup reaches its foreground writeback
high-water mark, it invokes the writeback code to send dirty inodes
belonging to that cgroup to the IO scheduler.  CFQ can then see those
requests and schedules them against other requests in the system.

If the thread doing the dirtying issues the IO directly, then CFQ can
see all the individual threads waiting on IO.  CFQ schedules between
them and provides the requested disk time isolation.

If the background writeout thread does the IO, the picture is
different.  Since there is only a single flusher thread per disk, the
order in which the inodes are issued to the IO scheduler matters.  The
writeback code issues fairly large chunks from each inode; from the IO
scheduler's point of view it will only see IO from a single group
while it works on that chunk.  So CFQ cannot provide fairness between
threads between buffered writers.

Chad
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html