On Thu 10-09-09 17:49:10, Peter Zijlstra wrote: > On Wed, 2009-09-09 at 16:23 +0200, Jan Kara wrote: > > Well, what I imagined we could do is: > > Have a per-bdi variable 'pages_written' - that would reflect the amount of > > pages written to the bdi since boot (OK, we'd have to handle overflows but > > that's doable). > > > > There will be a per-bdi variable 'pages_waited'. When a thread should sleep > > in balance_dirty_pages() because we are over limits, it kicks writeback thread > > and does: > > to_wait = max(pages_waited, pages_written) + sync_dirty_pages() (or > > whatever number we decide) > > pages_waited = to_wait > > sleep until pages_written reaches to_wait or we drop below dirty limits. > > > > That will make sure each thread will sleep until writeback threads have done > > their duty for the writing thread. > > > > If we make sure sleeping threads are properly ordered on the wait queue, > > we could always wakeup just the first one and thus avoid the herding > > effect. When we drop below dirty limits, we would just wakeup the whole > > waitqueue. > > > > Does this sound reasonable? > > That seems to go wrong when there's multiple tasks waiting on the same > bdi, you'd count each page for 1/n its weight. > > Suppose pages_written = 1024, and 4 tasks block and compute their to > wait as pages_written + 256 = 1280, then we'd release all 4 of them > after 256 pages are written, instead of 4*256, which would be > pages_written = 2048. Well, there's some locking needed of course. The intent is to stack demands as they come. So in case pages_written = 1024, pages_waited = 1024 we would do: THREAD 1: spin_lock to_wait = 1024 + 256 pages_waited = 1280 spin_unlock THREAD 2: spin_lock to_wait = 1280 + 256 pages_waited = 1536 spin_unlock So weight of each page will be kept. The fact that second thread effectively waits until the first thread has its demand satisfied looks strange at the first sight but we don't do better currently and I think it's fine - if they were two writer threads, then soon the thread released first will queue behind the thread still waiting so long term the behavior should be fair. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html