At page migration, the new page is unlocked before calling end_migration(). This is mis-understanding with page-migration code of memcg. And FILE_MAPPED of migrated file cache is not properly updated, now. At migrating mapped file, events happens in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst new page, before migration. But at this point no changes to page_cgroup, no commit-charge. 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. memcg commits the charge for newpage. Because "commit" happens after page-remap, we lose file_mapped accounting information at migration. For fixing all, this changes parepare/end migration. New migration sequence with memcg is: 1. allocate a new page. 2. charge against a new page onto old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 3. page migration replaces radix-tree, page table, etc... 4. At remapping, FILE_MAPPED will be properly updated. Changelog: 2010/04/14 - updated onto the latest mmotm + page-compaction, etc... - fixed __try_charge() bypass case. - use CHARGE_TYPE_FORCE for uncharging an unused page. Reported-by: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> --- include/linux/memcontrol.h | 6 +- mm/memcontrol.c | 96 ++++++++++++++++++++++++--------------------- mm/migrate.c | 2 3 files changed, 58 insertions(+), 46 deletions(-) Index: mmotm-temp/mm/memcontrol.c =================================================================== --- mmotm-temp.orig/mm/memcontrol.c +++ mmotm-temp/mm/memcontrol.c @@ -2505,10 +2505,12 @@ static inline int mem_cgroup_move_swap_a * Before starting migration, account PAGE_SIZE to mem_cgroup that the old * page belongs to. */ -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) +int mem_cgroup_prepare_migration(struct page *page, + struct page *newpage, struct mem_cgroup **ptr) { struct page_cgroup *pc; struct mem_cgroup *mem = NULL; + enum charge_type ctype; int ret = 0; if (mem_cgroup_disabled()) @@ -2521,65 +2523,73 @@ int mem_cgroup_prepare_migration(struct css_get(&mem->css); } unlock_page_cgroup(pc); - + /* + * If the page is uncharged before migration (removed from radix-tree) + * we return here. + */ + if (!mem) + return 0; *ptr = mem; - if (mem) { - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false); - css_put(&mem->css); - } + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false); + css_put(&mem->css); /* drop extra refcnt */ + /* + * When we get *ptr==NULL, it means the current is being killed and + * we skipped account charge, In this case, make migration failed. + */ + if (ret || *ptr == NULL) + return -ENOMEM; + /* + * The old page is under lock_page(). + * If the old_page is uncharged and freed while migration, page + * migration will fail and newpage will properly uncharged. + * This commit_charge against newpage never fails. + */ + pc = lookup_page_cgroup(newpage); + if (PageAnon(page)) + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED; + else if (page_is_file_cache(page)) + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE; + else + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM; + __mem_cgroup_commit_charge(mem, pc, ctype); + /* FILE_MAPPED of this page will be updated at remap routine */ return ret; } /* remove redundant charge if migration failed*/ void mem_cgroup_end_migration(struct mem_cgroup *mem, - struct page *oldpage, struct page *newpage) + struct page *oldpage, struct page *newpage) { - struct page *target, *unused; - struct page_cgroup *pc; - enum charge_type ctype; + struct page *used, *unused; if (!mem) return; cgroup_exclude_rmdir(&mem->css); /* at migration success, oldpage->mapping is NULL. */ if (oldpage->mapping) { - target = oldpage; - unused = NULL; + used = oldpage; + unused = newpage; } else { - target = newpage; + used = newpage; unused = oldpage; } - - if (PageAnon(target)) - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED; - else if (page_is_file_cache(target)) - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE; - else - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM; - - /* unused page is not on radix-tree now. */ - if (unused) - __mem_cgroup_uncharge_common(unused, ctype); - - pc = lookup_page_cgroup(target); + /* PageCgroupUsed() flag check will do all we want */ + __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE); /* - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup. - * So, double-counting is effectively avoided. - */ - __mem_cgroup_commit_charge(mem, pc, ctype); - - /* - * Both of oldpage and newpage are still under lock_page(). - * Then, we don't have to care about race in radix-tree. - * But we have to be careful that this page is unmapped or not. - * - * There is a case for !page_mapped(). At the start of - * migration, oldpage was mapped. But now, it's zapped. - * But we know *target* page is not freed/reused under us. - * mem_cgroup_uncharge_page() does all necessary checks. - */ - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED) - mem_cgroup_uncharge_page(target); + * If old page was file cache, and removed from radix-tree + * before lock_page(), perepare_migration doesn't charge and we never + * reach here. + * + * Considering ANON pages, we can't depend on lock_page. + * If a page is unmapped before it's remapped, new page's + * mapcount doesn't ncrease. (case that mapcount 0->1 never occur.) + * and we have no chance to uncharge it. + * + * Once mapcount goes to 1, our hook to page_remove_rmap will do + * enough jobs. + */ + if (PageAnon(used)) + mem_cgroup_uncharge_page(used); /* * At migration, we may charge account against cgroup which has no tasks * So, rmdir()->pre_destroy() can be called while we do this charge. Index: mmotm-temp/mm/migrate.c =================================================================== --- mmotm-temp.orig/mm/migrate.c +++ mmotm-temp/mm/migrate.c @@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get } /* charge against new page */ - charge = mem_cgroup_prepare_migration(page, &mem); + charge = mem_cgroup_prepare_migration(page, newpage, &mem); if (charge == -ENOMEM) { rc = -ENOMEM; goto unlock; Index: mmotm-temp/include/linux/memcontrol.h =================================================================== --- mmotm-temp.orig/include/linux/memcontrol.h +++ mmotm-temp/include/linux/memcontrol.h @@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem); extern int -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr); +mem_cgroup_prepare_migration(struct page *page, + struct page *newpage, struct mem_cgroup **ptr); extern void mem_cgroup_end_migration(struct mem_cgroup *mem, struct page *oldpage, struct page *newpage); @@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state } static inline int -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr) +mem_cgroup_prepare_migration(struct page *page, struct page *newpage, + struct mem_cgroup **ptr) { return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>