On Tue 08-09-09 20:32:26, Peter Zijlstra wrote: > On Tue, 2009-09-08 at 19:55 +0200, Peter Zijlstra wrote: > > > > I think I'm somewhat confused here though.. > > > > There's kernel threads doing writeout, and there's apps getting stuck in > > balance_dirty_pages(). > > > > If we want all writeout to be done by kernel threads (bdi/pd-flush like > > things) then we still need to manage the actual apps and delay them. > > > > As things stand now, we kick pdflush into action when dirty levels are > > above the background level, and start writing out from the app task when > > we hit the full dirty level. > > > > Moving all writeout to a kernel thread sounds good from writing linear > > stuff pov, but what do we make apps wait on then? > > OK, so like said in the previous email, we could have these app tasks > simply sleep on a waitqueue which gets periodic wakeups from > __bdi_writeback_inc() every time the dirty threshold drops. > > The woken tasks would then check their bdi dirty limit (its task > dependent) against the current values and either go back to sleep or > back to work. Well, what I imagined we could do is: Have a per-bdi variable 'pages_written' - that would reflect the amount of pages written to the bdi since boot (OK, we'd have to handle overflows but that's doable). There will be a per-bdi variable 'pages_waited'. When a thread should sleep in balance_dirty_pages() because we are over limits, it kicks writeback thread and does: to_wait = max(pages_waited, pages_written) + sync_dirty_pages() (or whatever number we decide) pages_waited = to_wait sleep until pages_written reaches to_wait or we drop below dirty limits. That will make sure each thread will sleep until writeback threads have done their duty for the writing thread. If we make sure sleeping threads are properly ordered on the wait queue, we could always wakeup just the first one and thus avoid the herding effect. When we drop below dirty limits, we would just wakeup the whole waitqueue. Does this sound reasonable? > The only problem would be the mass wakeups when lots of tasks are > blocked on dirty, but I'm guessing there's no way around that anyway, > and its better to have a limited number of writers than have everybody > write something, which would result in massive write fragmentation. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html