On Wed, Mar 30, 2011 at 03:49:17PM -0700, Chad Talbott wrote: > On Wed, Mar 30, 2011 at 3:20 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Wed, Mar 30, 2011 at 11:37:57AM -0400, Vivek Goyal wrote: > >> We are planning to track the IO context of original submitter of IO > >> by storing that information in page_cgroup. So that is not the problem. > >> > >> The problem google guys are trying to raise is that can a single flusher > >> thread keep all the groups on bdi busy in such a way so that higher > >> prio group can get more IO done. > > > > Which has nothing to do with IO-less dirty throttling at all! > > Not quite. Pre IO-less dirty throttling, any thread which was > dirtying did the writeback itself. Because there's no shortage of > threads to do the work, the IO scheduler sees a bunch of threads doing > writes against a given BDI and schedules them against each other. > This is how async IO isolation works for us. And it's precisely this behaviour that makes foreground throttling a scalability limitation, both from a list/lock contention POV and from a IO optimisation POV. > >> So the concern they raised that is single flusher thread per device > >> is enough to keep faster cgroup full at the bdi and hence get the > >> service differentiation. > > > > I think there's much bigger problems than that. > > We seem to be agreeing that it's a complicated problem. That's why I > think async write isolation needs some design-level discussion. >From my perspeccctive, we've still got a significant amount of work to get writeback into a scalable form for current generation machines, let alone future machines. Fixing the writeback code is a slow process because of all the subtle interactions with different filesystems and different workloads, whÑch made more complex by the fact that many filesystems implement their own writeback paths and have their own writeback semantics. We need to make the right decision on what IO to issue, not just issue lots of IO and hope it all turns out OK in the end. If we can't get that decision matrix right for the simple case of a global context, then we have no hope of extending it to cgroup-aware writeback. IOWs, we need to get writeback working in a scalable manner before we complicate it immensely with all this cgroup and isolation madness. Hence I think trying to make writeback cgroup-aware is probably 6-12 months premature at this point and trying to do it now will only serve to make it harder to get the common, simple cases working as we desire them to... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html