On Wed, Feb 1, 2012 at 10:33 PM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > Hi Greg, > > On Wed, Feb 01, 2012 at 12:24:25PM -0800, Greg Thelen wrote: >> 1. how to compute per-container pause based on bdi bandwidth, cgroup >> dirty page usage. >> 2. how to ensure that writeback will engage even if system and bdi are >> below respective background dirty ratios, yet a memcg is above its bg >> dirty limit. > > The solution to (1,2) would be something like this: > > --- linux-next.orig/mm/page-writeback.c 2012-02-02 14:13:45.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2012-02-02 14:24:11.000000000 +0800 > @@ -654,6 +654,17 @@ static unsigned long bdi_position_ratio( > pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT; > pos_ratio += 1 << RATELIMIT_CALC_SHIFT; > > + if (memcg) { > + long long f; > + x = div_s64((memcg_setpoint - memcg_dirty) << RATELIMIT_CALC_SHIFT, > + memcg_limit - memcg_setpoint + 1); > + f = x; > + f = f * x >> RATELIMIT_CALC_SHIFT; > + f = f * x >> RATELIMIT_CALC_SHIFT; > + f += 1 << RATELIMIT_CALC_SHIFT; > + pos_ratio = pos_ratio * f >> RATELIMIT_CALC_SHIFT; > + } > + > /* > * We have computed basic pos_ratio above based on global situation. If > * the bdi is over/under its share of dirty pages, we want to scale > @@ -1202,6 +1213,8 @@ static void balance_dirty_pages(struct a > freerun = dirty_freerun_ceiling(dirty_thresh, > background_thresh); > if (nr_dirty <= freerun) { > + if (memcg && memcg_dirty > memcg_freerun) > + goto start_writeback; > current->dirty_paused_when = now; > current->nr_dirtied = 0; > current->nr_dirtied_pause = > @@ -1209,6 +1222,7 @@ static void balance_dirty_pages(struct a > break; > } > > +start_writeback: > if (unlikely(!writeback_in_progress(bdi))) > bdi_start_background_writeback(bdi); > > > That makes the minimal change to enforce per-memcg dirty ratio. > It could result in a less stable control system, but should still > be able to balance things out. > > Thanks, > Fengguang Thank you for the quick patch. It looks promising. I can imagine how this would wake up background writeback. But I am unsure how background writeback will do anything. It seems like over_bground_thresh() would not necessarily see system or bdi dirty usage over respective limits. In previously posted memcg writeback patches this involved an fs-writeback.c call to mem_cgroups_over_bground_dirty_thresh() to check for memcg dirty limit compliance. Do you think we still need such a call out to memcg from writeback? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href