On Wed, 14 Apr 2010 12:06:22 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > At page migration, the new page is unlocked before calling end_migration(). > This is mis-understanding with page-migration code of memcg. > And FILE_MAPPED of migrated file cache is not properly updated, now. > > At migrating mapped file, events happens in following sequence. > > 1. allocate a new page. > 2. get memcg of an old page. > 3. charge ageinst new page, before migration. But at this point > no changes to page_cgroup, no commit-charge. > 4. page migration replaces radix-tree, old-page and new-page. > 5. page migration remaps the new page if the old page was mapped. > 6. memcg commits the charge for newpage. > > Because "commit" happens after page-remap, we lose file_mapped > accounting information at migration. > > For fixing all, this changes parepare/end migration. > New migration sequence with memcg is: > > 1. allocate a new page. > 2. charge against a new page onto old page's memcg. (here, the new page > is marked as PageCgroupUsed.) > 3. page migration replaces radix-tree, page table, etc... > 4. At remapping, FILE_MAPPED will be properly updated. (because newpage is marked as > USED, already) > 5. If anonymous page is freed before remap, check it and fixup accounting. > > Great! I like this change very much. Some comments are inlined. > Reported-by: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > --- > include/linux/memcontrol.h | 6 +- > mm/memcontrol.c | 95 ++++++++++++++++++++++++--------------------- > mm/migrate.c | 2 > 3 files changed, 56 insertions(+), 47 deletions(-) > > Index: mmotm-temp/mm/memcontrol.c > =================================================================== > --- mmotm-temp.orig/mm/memcontrol.c > +++ mmotm-temp/mm/memcontrol.c > @@ -2501,10 +2501,12 @@ static inline int mem_cgroup_move_swap_a > * Before starting migration, account PAGE_SIZE to mem_cgroup that the old > * page belongs to. > */ > -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) > +int mem_cgroup_prepare_migration(struct page *page, > + struct page *newpage, struct mem_cgroup **ptr) > { > struct page_cgroup *pc; > struct mem_cgroup *mem = NULL; > + enum charge_type ctype; > int ret = 0; > > if (mem_cgroup_disabled()) > @@ -2517,65 +2519,70 @@ int mem_cgroup_prepare_migration(struct > css_get(&mem->css); > } > unlock_page_cgroup(pc); > - > - if (mem) { > - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false); > - css_put(&mem->css); > - } > - *ptr = mem; > + /* > + * If the page is uncharged before migration (removed from radix-tree) > + * we return here. > + */ > + if (!mem) > + return 0; > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false); > + css_put(&mem->css); /* drop extra refcnt */ it should be: *ptr = mem; ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false); css_put(&mem->css); as Andrea has fixed already. > + if (ret) > + return ret; > + /* > + * The old page is under lock_page(). > + * If the old_page is uncharged and freed while migration, page migration > + * will fail and newpage will properly uncharged by end_migration. > + * And commit_charge against newpage never fails. > + */ > + pc = lookup_page_cgroup(newpage); > + if (PageAnon(page)) > + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED; > + else if (!PageSwapBacked(page)) I think using page_is_file_cache() would be better. > + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE; > + else > + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM; > + __mem_cgroup_commit_charge(mem, pc, ctype); > + /* FILE_MAPPED of this page will be updated at remap routine */ > return ret; > } > > /* remove redundant charge if migration failed*/ > void mem_cgroup_end_migration(struct mem_cgroup *mem, > - struct page *oldpage, struct page *newpage) > + struct page *oldpage, struct page *newpage) > { > - struct page *target, *unused; > - struct page_cgroup *pc; > - enum charge_type ctype; > + struct page *used, *unused; > > if (!mem) > return; > cgroup_exclude_rmdir(&mem->css); > + > + unnecessary extra line :) > /* at migration success, oldpage->mapping is NULL. */ > if (oldpage->mapping) { > - target = oldpage; > - unused = NULL; > + used = oldpage; > + unused = newpage; > } else { > - target = newpage; > + used = newpage; > unused = oldpage; > } > - > - if (PageAnon(target)) > - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED; > - else if (page_is_file_cache(target)) > - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE; > - else > - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM; > - > - /* unused page is not on radix-tree now. */ > - if (unused) > - __mem_cgroup_uncharge_common(unused, ctype); > - > - pc = lookup_page_cgroup(target); > - /* > - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup. > - * So, double-counting is effectively avoided. > - */ > - __mem_cgroup_commit_charge(mem, pc, ctype); > - > + /* PageCgroupUsed() flag check will do all we want */ > + mem_cgroup_uncharge_page(unused); hmm... using mem_cgroup_uncharge_page() would be enough, but I think it doesn't show what we want: we must uncharge "unused" by all means in PageCgroupUsed case, and I feel it strange a bit to uncharge "unused" by mem_cgroup_uncharge_page(), if it *was* a cache page. So I think __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE) would be better, otherwise we need more comments to explain why mem_cgroup_uncharge_page() is enough. > /* > - * Both of oldpage and newpage are still under lock_page(). > - * Then, we don't have to care about race in radix-tree. > - * But we have to be careful that this page is unmapped or not. > - * > - * There is a case for !page_mapped(). At the start of > - * migration, oldpage was mapped. But now, it's zapped. > - * But we know *target* page is not freed/reused under us. > - * mem_cgroup_uncharge_page() does all necessary checks. > - */ > - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED) > - mem_cgroup_uncharge_page(target); > + * If old page was file cache, and removed from radix-tree > + * before lock_page(), perepare_migration doesn't charge and we never > + * reach here. > + * And if newpage was removed from radix-tree after unlock_page(), the context which removed it from radix-tree uncharges it properly, because it is charged at prepare_migration. right? > + * Considering ANON pages, we can't depend on lock_page. > + * If a page may be unmapped before it's remapped, new page's > + * mapcount will not increase. (case that mapcount 0->1 never occur.) > + * PageCgroupUsed() and SwapCache checks will be done. > + * > + * Once mapcount goes to 1, our hook to page_remove_rmap will do > + * enough jobs. > + */ > + if (PageAnon(used) && !page_mapped(used)) > + mem_cgroup_uncharge_page(used); mem_cgroup_uncharge_page() does the same check :) Thanks, Daisuke Nishimura. > /* > * At migration, we may charge account against cgroup which has no tasks > * So, rmdir()->pre_destroy() can be called while we do this charge. > Index: mmotm-temp/mm/migrate.c > =================================================================== > --- mmotm-temp.orig/mm/migrate.c > +++ mmotm-temp/mm/migrate.c > @@ -576,7 +576,7 @@ static int unmap_and_move(new_page_t get > } > > /* charge against new page */ > - charge = mem_cgroup_prepare_migration(page, &mem); > + charge = mem_cgroup_prepare_migration(page, newpage, &mem); > if (charge == -ENOMEM) { > rc = -ENOMEM; > goto unlock; > Index: mmotm-temp/include/linux/memcontrol.h > =================================================================== > --- mmotm-temp.orig/include/linux/memcontrol.h > +++ mmotm-temp/include/linux/memcontrol.h > @@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru > extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem); > > extern int > -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr); > +mem_cgroup_prepare_migration(struct page *page, > + struct page *newpage, struct mem_cgroup **ptr); > extern void mem_cgroup_end_migration(struct mem_cgroup *mem, > struct page *oldpage, struct page *newpage); > > @@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state > } > > static inline int > -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) > +mem_cgroup_prepare_migration(struct page *page, struct page *newpage, > + struct mem_cgroup **ptr) > { > return 0; > } > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>