On Wed, Mar 30, 2011 at 3:20 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Wed, Mar 30, 2011 at 11:37:57AM -0400, Vivek Goyal wrote: >> We are planning to track the IO context of original submitter of IO >> by storing that information in page_cgroup. So that is not the problem. >> >> The problem google guys are trying to raise is that can a single flusher >> thread keep all the groups on bdi busy in such a way so that higher >> prio group can get more IO done. > > Which has nothing to do with IO-less dirty throttling at all! Not quite. Pre IO-less dirty throttling, any thread which was dirtying did the writeback itself. Because there's no shortage of threads to do the work, the IO scheduler sees a bunch of threads doing writes against a given BDI and schedules them against each other. This is how async IO isolation works for us. >> It should not happen that flusher >> thread gets blocked somewhere (trying to get request descriptors on >> request queue) > > A major design principle of the bdi-flusher threads is that they > are supposed to block when the request queue gets full - that's how > we got rid of all the congestion garbage from the writeback > stack. With IO cgroups and async write isolation, there are multiple queues per disk that all need to be filled to allow cgroup-aware CFQ schedule between them. If the per-BDI threads could be taught to fill each per-cgroup queue before giving up on a BDI, then IO-less throttling could work. Also, having per-(BDI, blkio cgroup)-flusher threads would work. I think it's complicated enough to warrant a discussion. > There are plans to move the bdi-flusher threads to work queues, and > once that is done all your concerns about blocking and parallelism > are pretty much gone because it's trivial to have multiple writeback > works in progress at once on the same bdi with that infrastructure. This sounds promising. >> So the concern they raised that is single flusher thread per device >> is enough to keep faster cgroup full at the bdi and hence get the >> service differentiation. > > I think there's much bigger problems than that. We seem to be agreeing that it's a complicated problem. That's why I think async write isolation needs some design-level discussion. Chad -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html