(2012/04/19 22:12), Johannes Weiner wrote: > On Thu, Apr 19, 2012 at 09:59:20AM +0900, KAMEZAWA Hiroyuki wrote: >> (2012/04/19 8:33), Andrew Morton wrote: >> >>> On Wed, 18 Apr 2012 11:21:55 -0700 >>> Ying Han <yinghan@xxxxxxxxxx> wrote: >>>> static void __free_pages_ok(struct page *page, unsigned int order) >>>> { >>>> unsigned long flags; >>>> - int wasMlocked = __TestClearPageMlocked(page); >>>> + bool locked; >>>> >>>> if (!free_pages_prepare(page, order)) >>>> return; >>>> >>>> local_irq_save(flags); >>>> - if (unlikely(wasMlocked)) >>>> + mem_cgroup_begin_update_page_stat(page, &locked, &flags); >>> >>> hm, what's going on here. The page now has a zero refcount and is to >>> be returned to the buddy. But mem_cgroup_begin_update_page_stat() >>> assumes that the page still belongs to a memcg. I'd have thought that >>> any page_cgroup backreferences would have been torn down by now? >>> >>>> + if (unlikely(__TestClearPageMlocked(page))) >>>> free_page_mlock(page); >>> >> >> >> Ah, this is problem. Now, we have following code. >> == >> >>> struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page, >>> enum lru_list lru) >>> { >>> struct mem_cgroup_per_zone *mz; >>> struct mem_cgroup *memcg; >>> struct page_cgroup *pc; >>> >>> if (mem_cgroup_disabled()) >>> return &zone->lruvec; >>> >>> pc = lookup_page_cgroup(page); >>> memcg = pc->mem_cgroup; >>> >>> /* >>> * Surreptitiously switch any uncharged page to root: >>> * an uncharged page off lru does nothing to secure >>> * its former mem_cgroup from sudden removal. >>> * >>> * Our caller holds lru_lock, and PageCgroupUsed is updated >>> * under page_cgroup lock: between them, they make all uses >>> * of pc->mem_cgroup safe. >>> */ >>> if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) >>> pc->mem_cgroup = memcg = root_mem_cgroup; >> >> == >> >> Then, accessing pc->mem_cgroup without checking PCG_USED bit is dangerous. >> It may trigger #GP because of suddern removal of memcg or because of above >> code, mis-accounting will happen... pc->mem_cgroup may be overwritten already. >> >> Proposal from me is calling TestClearPageMlocked(page) via mem_cgroup_uncharge(). >> >> Like this. >> == >> mem_cgroup_charge_statistics(memcg, anon, -nr_pages); >> >> /* >> * Pages reach here when it's fully unmapped or dropped from file cache. >> * we are under lock_page_cgroup() and have no race with memcg activities. >> */ >> if (unlikely(PageMlocked(page))) { >> if (TestClearPageMlocked()) >> decrement counter. >> } >> >> ClearPageCgroupUsed(pc); >> == >> But please check performance impact... > > This makes the lifetime rules of mlocked anon really weird. > yes. > Plus this code runs for ALL uncharges, the unlikely() and preliminary > flag testing don't make it okay. It's bad that we have this in the > allocator, but at least it would be good to hook into that branch and > not add another one. > > pc->mem_cgroup stays intact after the uncharge. Could we make the > memcg removal path wait on the mlock counter to drop to zero instead > and otherwise keep Ying's version? > handling problem in ->destroy() path ? Hmm, it will work against use-after-free. But accounting problem which may be caused by mem_cgroup_lru_add_list() cannot be handled, which overwrites pc->mem_cgroup. But hm, is this too slow ?... == mem_cgroup_uncharge_common() { .... if (PageSwapCache(page) || PageMlocked(page)) return NULL; } page_alloc.c:: static inline void free_page_mlock(struct page *page) { __dec_zone_page_state(page, NR_MLOCK); __count_vm_event(UNEVICTABLE_MLOCKFREED); mem_cgroup_uncharge_page(page); } == BTW, at reading code briefly....why we have hooks in free_page() ? It seems do_munmap() and exit_mmap() calls munlock_vma_pages_all(). So, it seems all vmas which has VM_MLOCKED are checked before freeing. vmscan never frees mlocked pages, I think. Any other path to free mlocked pages without munlock ? I feel freeing Mlocked page is a cause of problems. Thanks, -Kame Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>