On Thu, Oct 08, 2009 at 09:01:59AM +0800, KAMEZAWA Hiroyuki wrote: > tatus: RO > Content-Length: 12481 > Lines: 332 > > On Wed, 07 Oct 2009 15:38:36 +0800 > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > As proposed by Chris, Dave and Jan, let balance_dirty_pages() wait for > > the per-bdi flusher to writeback enough pages for it, instead of > > starting foreground writeback by itself. By doing so we harvest two > > benefits: > > - avoid concurrent writeback of multiple inodes (Dave Chinner) > > If every thread doing writes and being throttled start foreground > > writeback, it leads to N IO submitters from at least N different > > inodes at the same time, end up with N different sets of IO being > > issued with potentially zero locality to each other, resulting in > > much lower elevator sort/merge efficiency and hence we seek the disk > > all over the place to service the different sets of IO. > > OTOH, if there is only one submission thread, it doesn't jump between > > inodes in the same way when congestion clears - it keeps writing to > > the same inode, resulting in large related chunks of sequential IOs > > being issued to the disk. This is more efficient than the above > > foreground writeback because the elevator works better and the disk > > seeks less. > > - avoid one constraint torwards huge per-file nr_to_write > > The write_chunk used by balance_dirty_pages() should be small enough to > > prevent user noticeable one-shot latency. Ie. each sleep/wait inside > > balance_dirty_pages() shall be small enough. When it starts its own > > writeback, it must specify a small nr_to_write. The throttle wait queue > > removes this dependancy by the way. > > > > May I ask a question ? (maybe not directly related to this patch itself, sorry) Sure :) > Recent works as "writeback: switch to per-bdi threads for flushing data" > removed congestion_wait() from balance_dirty_pages() and added > schedule_timeout_interruptible(). > > And this one replaces it with wake_up+wait_queue. Right. > IIUC, "iowait" cpustat data was calculated by runqueue->nr_iowait as > == kernel/schec.c > void account_idle_time(cputime_t cputime) > { > struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; > cputime64_t cputime64 = cputime_to_cputime64(cputime); > struct rq *rq = this_rq(); > > if (atomic_read(&rq->nr_iowait) > 0) > cpustat->iowait = cputime64_add(cpustat->iowait, cputime64); > else > cpustat->idle = cputime64_add(cpustat->idle, cputime64); > } > == > Then, for showing "cpu is in iowait", runqueue->nr_iowait should be modified > at some places. In old kernel, congestion_wait() at el did that by calling > io_schedule_timeout(). > > How this runqueue->nr_iowait is handled now ? Good question. io_schedule() has an old comment for throttling IO wait: * But don't do that if it is a deliberate, throttling IO wait (this task * has set its backing_dev_info: the queue against which it should throttle) */ void __sched io_schedule(void) So it looks both Jens' and this patch behaves right in ignoring the iowait accounting for balance_dirty_pages() :) Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html