On Tue, Feb 15, 2011 at 4:54 AM, Jan Kara <jack@xxxxxxx> wrote: > On Fri 04-02-11 15:07:15, Chad Talbott wrote: >> Per-cgroup dirty ratios is just the beginning, as you mention. Unless >> the IO scheduler can see the deep queues of all the blocked tasks, it >> can't make the right decisions. Also, today writeback is ignorant of >> the tasks' debt to the IO scheduler, so it issues the "wrong" inodes. > I'm curious: Could you elaborate a bit more about this? I'm not sure what > a debt to the IO scheduler is and why choice of inodes would matter... Sorry, this comment needs more context. Google's servers typically operate with both memory capacity and disk time isolation via cgroups. We maintain a set of patches that provide page tracking and buffered write isolation. We are working on sending those patches out alongside the memcg efforts. A scenario: When a given cgroup reaches its foreground writeback high-water mark, it invokes the writeback code to send dirty inodes belonging to that cgroup to the IO scheduler. CFQ can then see those requests and schedules them against other requests in the system. If the thread doing the dirtying issues the IO directly, then CFQ can see all the individual threads waiting on IO. CFQ schedules between them and provides the requested disk time isolation. If the background writeout thread does the IO, the picture is different. Since there is only a single flusher thread per disk, the order in which the inodes are issued to the IO scheduler matters. The writeback code issues fairly large chunks from each inode; from the IO scheduler's point of view it will only see IO from a single group while it works on that chunk. So CFQ cannot provide fairness between threads between buffered writers. Chad -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html