On Tue, Mar 08, 2011 at 11:31:13PM +0100, Jan Kara wrote: [..] > +/* > + * balance_dirty_pages() must be called by processes which are generating dirty > + * data. It looks at the number of dirty pages in the machine and will force > + * the caller to perform writeback if the system is over `vm_dirty_ratio'. > + * If we're over `background_thresh' then the writeback threads are woken to > + * perform some writeout. > + */ > +static void balance_dirty_pages(struct address_space *mapping, > + unsigned long write_chunk) > +{ > + struct backing_dev_info *bdi = mapping->backing_dev_info; > + struct balance_waiter bw; > + struct dirty_limit_state st; > + int dirty_exceeded = check_dirty_limits(bdi, &st); > + > + if (dirty_exceeded < DIRTY_MAY_EXCEED_LIMIT || > + (dirty_exceeded == DIRTY_MAY_EXCEED_LIMIT && > + !bdi_task_limit_exceeded(&st, current))) { > + if (bdi->dirty_exceeded && > + dirty_exceeded < DIRTY_MAY_EXCEED_LIMIT) > + bdi->dirty_exceeded = 0; > /* > - * Increase the delay for each loop, up to our previous > - * default of taking a 100ms nap. > + * In laptop mode, we wait until hitting the higher threshold > + * before starting background writeout, and then write out all > + * the way down to the lower threshold. So slow writers cause > + * minimal disk activity. > + * > + * In normal mode, we start background writeout at the lower > + * background_thresh, to keep the amount of dirty memory low. > */ > - pause <<= 1; > - if (pause > HZ / 10) > - pause = HZ / 10; > + if (!laptop_mode && dirty_exceeded == DIRTY_EXCEED_BACKGROUND) > + bdi_start_background_writeback(bdi); > + return; > } > > - /* Clear dirty_exceeded flag only when no task can exceed the limit */ > - if (!min_dirty_exceeded && bdi->dirty_exceeded) > - bdi->dirty_exceeded = 0; > + if (!bdi->dirty_exceeded) > + bdi->dirty_exceeded = 1; > > - if (writeback_in_progress(bdi)) > - return; > + trace_writeback_balance_dirty_pages_waiting(bdi, write_chunk); > + /* Kick flusher thread to start doing work if it isn't already */ > + bdi_start_background_writeback(bdi); > > + bw.bw_wait_pages = write_chunk; > + bw.bw_task = current; > + spin_lock(&bdi->balance_lock); > /* > - * In laptop mode, we wait until hitting the higher threshold before > - * starting background writeout, and then write out all the way down > - * to the lower threshold. So slow writers cause minimal disk activity. > - * > - * In normal mode, we start background writeout at the lower > - * background_thresh, to keep the amount of dirty memory low. > + * First item? Need to schedule distribution of IO completions among > + * items on balance_list > + */ > + if (list_empty(&bdi->balance_list)) { > + bdi->written_start = bdi_stat_sum(bdi, BDI_WRITTEN); > + /* FIXME: Delay should be autotuned based on dev throughput */ > + schedule_delayed_work(&bdi->balance_work, HZ/10); > + } > + /* > + * Add work to the balance list, from now on the structure is handled > + * by distribute_page_completions() > + */ > + list_add_tail(&bw.bw_list, &bdi->balance_list); > + bdi->balance_waiters++; Hi Jan, Had a query. - What makes sure that flusher thread will not stop writing back till all the waiters on the bdi have been woken up. IIUC, flusher thread will stop once global background ratio is with-in limit. Is it possible that there are still some waiter on some bdi waiting for more pages to finish writeback and that might not happen for sometime. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html