> Only if the queue depth is not bound. Queue depths are bound and so > the distance we can go over the threshold is limited. This is the > fundamental principle on which the throttling is based..... > > Hence, if the queue is not full, then we will have either written > dirty pages to it (i.e wbc->nr_write != write_chunk so we will throttle > or continue normally if write_chunk was written) or we have no more > dirty pages left. > > Having no dirty pages left on the bdi and it not being congested > means we effectively have a clean, idle bdi. We should not be trying > to throttle writeback here - we can't do anything to improve the > situation by continuing to try to do writeback on this bdi, so we > may as well give up and let the writer continue. Once we have dirty > pages on the bdi, we'll get throttled appropriately. OK, you convinced me. How about this patch? I introduced a new wbc counter, that sums the number of dirty pages encountered, including ones already under writeback. Dave, big thanks for your insights. Miklos Index: linux/include/linux/writeback.h =================================================================== --- linux.orig/include/linux/writeback.h 2007-03-14 22:43:42.000000000 +0100 +++ linux/include/linux/writeback.h 2007-03-14 22:58:56.000000000 +0100 @@ -44,6 +44,7 @@ struct writeback_control { long nr_to_write; /* Write this many pages, and decrement this for each page written */ long pages_skipped; /* Pages which were not written */ + long nr_dirty; /* Number of dirty pages encountered */ /* * For a_ops->writepages(): is start or end are non-zero then this is Index: linux/mm/page-writeback.c =================================================================== --- linux.orig/mm/page-writeback.c 2007-03-14 22:41:01.000000000 +0100 +++ linux/mm/page-writeback.c 2007-03-14 23:00:20.000000000 +0100 @@ -220,6 +220,17 @@ static void balance_dirty_pages(struct a pages_written += write_chunk - wbc.nr_to_write; if (pages_written >= write_chunk) break; /* We've done our duty */ + + /* + * If just a few dirty pages were encountered, and + * the queue is not congested, then allow this dirty + * producer to continue. This resolves the deadlock + * that happens when one filesystem writes back data + * through another. It should also help when a slow + * device is completely blocking other writes. + */ + if (wbc.nr_dirty < 8 && !bdi_write_congested(bdi)) + break; } congestion_wait(WRITE, HZ/10); } @@ -612,6 +623,7 @@ retry: min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { unsigned i; + wbc->nr_dirty += nr_pages; scanned = 1; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html