On Tue, 21 Dec 2010, Wu Fengguang wrote: > > This avoids unnecessary checks and dirty throttling on tmpfs/ramfs. > > It also prevents > > [ 388.126563] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 > > in the balance_dirty_pages tracepoint, which will call > > dev_name(mapping->backing_dev_info->dev) > > but shmem_backing_dev_info.dev is NULL. > > Summary notes about the tmpfs/ramfs behavior changes: > > As for 2.6.36 and older kernels, the tmpfs writes will sleep inside > balance_dirty_pages() as long as we are over the (dirty+background)/2 > global throttle threshold. This is because both the dirty pages and > threshold will be 0 for tmpfs/ramfs. Hence this test will always > evaluate to TRUE: > > dirty_exceeded = > (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh) > || (nr_reclaimable + nr_writeback >= dirty_thresh); > > For 2.6.37, someone complained that the current logic does not allow the > users to set vm.dirty_ratio=0. So commit 4cbec4c8b9 changed the test to > > dirty_exceeded = > (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh) > || (nr_reclaimable + nr_writeback > dirty_thresh); > > So 2.6.37 will behave differently for tmpfs/ramfs: it will never get > throttled unless the global dirty threshold is exceeded (which is very > unlikely to happen; once happen, will block many tasks). > > I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means > for a busy writing server, tmpfs write()s may get livelocked! The > "inadvertent" throttling can hardly bring help to any workload because > of its "either no throttling, or get throttled to death" property. > > So based on 2.6.37, this patch won't bring more noticeable changes. > > CC: Hugh Dickins <hughd@xxxxxxxxxx> > CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Acked-by: Rik van Riel <riel@xxxxxxxxxx> > Reviewed-by: Minchan Kim <minchan.kim@xxxxxxxxx> > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> Thanks a lot for investigating further and writing it all up here. Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> I notice bdi_cap_writeback_dirty go from bdi_writeout_fraction(), and bdi_cap_account_dirty appear in balance_dirty_pages_ratelimited_nr(): maybe one day a patch to use just one flag throughout? Unless you can dream up a use for the divergence. (I hate wasting brainpower trying to decide which of two always-the-sames to use, like page_cache_release() and put_page(), until there's actual code to distinguish them.) Hugh > --- > mm/page-writeback.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > --- linux-next.orig/mm/page-writeback.c 2010-12-18 09:14:53.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2010-12-21 17:35:44.000000000 +0800 > @@ -230,13 +230,8 @@ void task_dirty_inc(struct task_struct * > static void bdi_writeout_fraction(struct backing_dev_info *bdi, > long *numerator, long *denominator) > { > - if (bdi_cap_writeback_dirty(bdi)) { > - prop_fraction_percpu(&vm_completions, &bdi->completions, > + prop_fraction_percpu(&vm_completions, &bdi->completions, > numerator, denominator); > - } else { > - *numerator = 0; > - *denominator = 1; > - } > } > > static inline void task_dirties_fraction(struct task_struct *tsk, > @@ -878,6 +873,9 @@ void balance_dirty_pages_ratelimited_nr( > { > struct backing_dev_info *bdi = mapping->backing_dev_info; > > + if (!bdi_cap_account_dirty(bdi)) > + return; > + > current->nr_dirtied += nr_pages_dirtied; > > if (unlikely(!current->nr_dirtied_pause)) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html