On Thu, 18 Aug 2011 00:10:56 -0700 Greg Thelen <gthelen@xxxxxxxxxx> wrote: > KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> writes: > > > On Wed, 17 Aug 2011 09:15:03 -0700 > > Greg Thelen <gthelen@xxxxxxxxxx> wrote: > > > >> When the system is under background dirty memory threshold but some > >> cgroups are over their background dirty memory thresholds, then only > >> writeback inodes associated with the over-limit cgroups. > >> > >> In addition to checking if the system dirty memory usage is over the > >> system background threshold, over_bground_thresh() now checks if any > >> cgroups are over their respective background dirty memory thresholds. > >> > >> If over-limit cgroups are found, then the new > >> wb_writeback_work.for_cgroup field is set to distinguish between system > >> and memcg overages. The new wb_writeback_work.shared_inodes field is > >> also set. Inodes written by multiple cgroup are marked owned by > >> I_MEMCG_SHARED rather than a particular cgroup. Such shared inodes > >> cannot easily be attributed to a cgroup, so per-cgroup writeback > >> (futures version of wakeup_flusher_threads and balance_dirty_pages) > >> performs suboptimally in the presence of shared inodes. Therefore, > >> write shared inodes when performing cgroup background writeback. > >> > >> If performing cgroup writeback, move_expired_inodes() skips inodes that > >> do not contribute dirty pages to the cgroup being written back. > >> > >> After writing some pages, wb_writeback() will call > >> mem_cgroup_writeback_done() to update the set of over-bg-limits memcg. > >> > >> This change also makes wakeup_flusher_threads() memcg aware so that > >> per-cgroup try_to_free_pages() is able to operate more efficiently > >> without having to write pages of foreign containers. This change adds a > >> mem_cgroup parameter to wakeup_flusher_threads() to allow callers, > >> especially try_to_free_pages() and foreground writeback from > >> balance_dirty_pages(), to specify a particular cgroup to write inodes > >> from. > >> > >> Signed-off-by: Greg Thelen <gthelen@xxxxxxxxxx> > >> --- > >> Changelog since v8: > >> > >> - Added optional memcg parameter to __bdi_start_writeback(), > >> bdi_start_writeback(), wakeup_flusher_threads(), writeback_inodes_wb(). > >> > >> - move_expired_inodes() now uses pass in struct wb_writeback_work instead of > >> struct writeback_control. > >> > >> - Added comments to over_bground_thresh(). > >> > >> fs/buffer.c | 2 +- > >> fs/fs-writeback.c | 96 +++++++++++++++++++++++++++++++++----------- > >> fs/sync.c | 2 +- > >> include/linux/writeback.h | 6 ++- > >> mm/backing-dev.c | 3 +- > >> mm/page-writeback.c | 3 +- > >> mm/vmscan.c | 3 +- > >> 7 files changed, 84 insertions(+), 31 deletions(-) > >> > >> diff --git a/fs/buffer.c b/fs/buffer.c > >> index dd0220b..da1fb23 100644 > >> --- a/fs/buffer.c > >> +++ b/fs/buffer.c > >> @@ -293,7 +293,7 @@ static void free_more_memory(void) > >> struct zone *zone; > >> int nid; > >> > >> - wakeup_flusher_threads(1024); > >> + wakeup_flusher_threads(1024, NULL); > >> yield(); > >> > >> for_each_online_node(nid) { > >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > >> index e91fb82..ba55336 100644 > >> --- a/fs/fs-writeback.c > >> +++ b/fs/fs-writeback.c > >> @@ -38,10 +38,14 @@ struct wb_writeback_work { > >> struct super_block *sb; > >> unsigned long *older_than_this; > >> enum writeback_sync_modes sync_mode; > >> + unsigned short memcg_id; /* If non-zero, then writeback specified > >> + * cgroup. */ > >> unsigned int tagged_writepages:1; > >> unsigned int for_kupdate:1; > >> unsigned int range_cyclic:1; > >> unsigned int for_background:1; > >> + unsigned int for_cgroup:1; /* cgroup writeback */ > >> + unsigned int shared_inodes:1; /* write inodes spanning cgroups */ > >> > >> struct list_head list; /* pending work list */ > >> struct completion *done; /* set if the caller waits */ > >> @@ -114,9 +118,12 @@ static void bdi_queue_work(struct backing_dev_info *bdi, > >> spin_unlock_bh(&bdi->wb_lock); > >> } > >> > >> +/* > >> + * @memcg is optional. If set, then limit writeback to the specified cgroup. > >> + */ > >> static void > >> __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, > >> - bool range_cyclic) > >> + bool range_cyclic, struct mem_cgroup *memcg) > >> { > >> struct wb_writeback_work *work; > >> > >> @@ -136,6 +143,8 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, > >> work->sync_mode = WB_SYNC_NONE; > >> work->nr_pages = nr_pages; > >> work->range_cyclic = range_cyclic; > >> + work->memcg_id = memcg ? css_id(mem_cgroup_css(memcg)) : 0; > >> + work->for_cgroup = memcg != NULL; > >> > > > > > > I couldn't find a patch for mem_cgroup_css(NULL). Is it in patch 1-10 ? > > Other parts seems ok to me. > > > > > > Thanks, > > -Kame > > Mainline commit d324236b3333e87c8825b35f2104184734020d35 adds > mem_cgroup_css() to memcontrol.c. The above code does not call > mem_cgroup_css() with a NULL parameter due to the 'memcg ? ...' check. > So I do not think any additional changes to mem_cgroup_css() are needed. > Am I missing your point? > I thought you need == struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem) { + if (!mem) + return NULL; return &mem->css; } == And == unsigned short css_id(struct cgroup_subsys_state *css) { struct css_id *cssid; + if (!css) return 0; } == Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html