On Thu, Apr 19, 2012 at 11:16 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > (2012/04/20 14:57), Ying Han wrote: > >> On Thu, Apr 19, 2012 at 5:37 PM, KAMEZAWA Hiroyuki >> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: >>> (2012/04/19 22:12), Johannes Weiner wrote: >>>> Plus this code runs for ALL uncharges, the unlikely() and preliminary >>>> flag testing don't make it okay. It's bad that we have this in the >>>> allocator, but at least it would be good to hook into that branch and >>>> not add another one. >>>> >>>> pc->mem_cgroup stays intact after the uncharge. Could we make the >>>> memcg removal path wait on the mlock counter to drop to zero instead >>>> and otherwise keep Ying's version? >>>> >>> >>> >>> handling problem in ->destroy() path ? Hmm, it will work against use-after-free. >> >>> But accounting problem which may be caused by mem_cgroup_lru_add_list() cannot >>> be handled, which overwrites pc->mem_cgroup. >> >> Kame, can you clarify that? What the mem_cgroup_lru_add_list() has >> anything to do w/ this problem? >> > > > It overwrites pc->mem_cgroup. Then, Assume a task in cgroup "A". > > 1. page is charged. pc->mem_cgroup = A + Used bit. > 2. page is set Mlocked. A's mlock-counter += 1 > 3. page is uncharged - Used bit. > 4. page is added to lru pc->mem_cgroup = root > 5. page is freed root's mlock-coutner -=1, > > Then, A's mlock-counter +1, root's mlock-counter -1 IF free_pages() > really handle mlocked pages... Hmm, now the question is whether the TestClearPageMlock() should only happen between step 2 and step 3. If so, the mlock stat will be updated correctly. > > > >>> >>> But hm, is this too slow ?... >>> == >>> mem_cgroup_uncharge_common() >>> { >>> .... >>> if (PageSwapCache(page) || PageMlocked(page)) >>> return NULL; >>> } >>> >>> page_alloc.c:: >>> >>> static inline void free_page_mlock(struct page *page) >>> { >>> >>> __dec_zone_page_state(page, NR_MLOCK); >>> __count_vm_event(UNEVICTABLE_MLOCKFREED); >>> >>> mem_cgroup_uncharge_page(page); >>> } >>> == >>> >>> BTW, at reading code briefly....why we have hooks in free_page() ? >>> >>> It seems do_munmap() and exit_mmap() calls munlock_vma_pages_all(). >>> So, it seems all vmas which has VM_MLOCKED are checked before freeing. >>> vmscan never frees mlocked pages, I think. >>> >>> Any other path to free mlocked pages without munlock ? >> >> I found this commit which introduced the hook in the freeing path, >> however I couldn't get more details why it was introduced from the >> commit description >> >> commit 985737cf2ea096ea946aed82c7484d40defc71a8 >> Author: Lee Schermerhorn <lee.schermerhorn@xxxxxx> >> Date: Sat Oct 18 20:26:53 2008 -0700 >> >> mlock: count attempts to free mlocked page >> >> Allow free of mlock()ed pages. This shouldn't happen, but during >> developement, it occasionally did. >> >> This patch allows us to survive that condition, while keeping the >> statistics and events correct for debug. >> >>> I feel freeing Mlocked page is a cause of problems. >> > > > Sigh...."This shouldn't happen"!!!!! > > How about adding warning to free_page() path and remove your current hook ? That does make thing a lot simpler.. I will wait a bit in case someone remember a counter example? --Ying > Thanks, > -Kame > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href