Hi Greg, On Wed, Feb 01, 2012 at 12:24:25PM -0800, Greg Thelen wrote: > On Tue, Jan 31, 2012 at 4:55 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > > 4. dirty ratio > > In the last year, patches were posted but not merged. I'd like to hear > > works on this area. > > I would like to attend to discuss this topic. I have not had much time to work > on this recently, but should be able to focus more on this soon. The > IO less writeback changes require some redesign and may allow for a > simpler implementation of mem_cgroup_balance_dirty_pages(). > Maintaining a per container dirty page counts, ratios, and limits is > fairly easy, but integration with writeback is the challenge. My big > questions are for writeback people: > 1. how to compute per-container pause based on bdi bandwidth, cgroup > dirty page usage. > 2. how to ensure that writeback will engage even if system and bdi are > below respective background dirty ratios, yet a memcg is above its bg > dirty limit. The solution to (1,2) would be something like this: --- linux-next.orig/mm/page-writeback.c 2012-02-02 14:13:45.000000000 +0800 +++ linux-next/mm/page-writeback.c 2012-02-02 14:24:11.000000000 +0800 @@ -654,6 +654,17 @@ static unsigned long bdi_position_ratio( pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT; pos_ratio += 1 << RATELIMIT_CALC_SHIFT; + if (memcg) { + long long f; + x = div_s64((memcg_setpoint - memcg_dirty) << RATELIMIT_CALC_SHIFT, + memcg_limit - memcg_setpoint + 1); + f = x; + f = f * x >> RATELIMIT_CALC_SHIFT; + f = f * x >> RATELIMIT_CALC_SHIFT; + f += 1 << RATELIMIT_CALC_SHIFT; + pos_ratio = pos_ratio * f >> RATELIMIT_CALC_SHIFT; + } + /* * We have computed basic pos_ratio above based on global situation. If * the bdi is over/under its share of dirty pages, we want to scale @@ -1202,6 +1213,8 @@ static void balance_dirty_pages(struct a freerun = dirty_freerun_ceiling(dirty_thresh, background_thresh); if (nr_dirty <= freerun) { + if (memcg && memcg_dirty > memcg_freerun) + goto start_writeback; current->dirty_paused_when = now; current->nr_dirtied = 0; current->nr_dirtied_pause = @@ -1209,6 +1222,7 @@ static void balance_dirty_pages(struct a break; } +start_writeback: if (unlikely(!writeback_in_progress(bdi))) bdi_start_background_writeback(bdi); That makes the minimal change to enforce per-memcg dirty ratio. It could result in a less stable control system, but should still be able to balance things out. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>