Hi, I've finally manage to untagle myself from internal stuff... On Wed 18-06-14 16:40:44, Johannes Weiner wrote: > The memcg charge API charges pages before they are rmapped - i.e. have > an actual "type" - and so every callsite needs its own set of charge > and uncharge functions to know what type is being operated on. Worse, > uncharge has to happen from a context that is still type-specific, > rather than at the end of the page's lifetime with exclusive access, > and so requires a lot of synchronization. > > Rewrite the charge API to provide a generic set of try_charge(), > commit_charge() and cancel_charge() transaction operations, much like > what's currently done for swap-in: > > mem_cgroup_try_charge() attempts to reserve a charge, reclaiming > pages from the memcg if necessary. > > mem_cgroup_commit_charge() commits the page to the charge once it > has a valid page->mapping and PageAnon() reliably tells the type. > > mem_cgroup_cancel_charge() aborts the transaction. > > This reduces the charge API and enables subsequent patches to > drastically simplify uncharging. > > As pages need to be committed after rmap is established but before > they are added to the LRU, page_add_new_anon_rmap() must stop doing > LRU additions again. Revive lru_cache_add_active_or_unevictable(). I think it would make more sense to do lru_cache_add_active_or_unevictable in a separate patch for easier review. Too late, though... Few comments bellow > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> The patch looks correct but the code is quite tricky so I hope I didn't miss anything. Acked-by: Michal Hocko <mhocko@xxxxxxx> > --- > Documentation/cgroups/memcg_test.txt | 32 +-- > include/linux/memcontrol.h | 53 ++--- > include/linux/swap.h | 3 + > kernel/events/uprobes.c | 1 + > mm/filemap.c | 9 +- > mm/huge_memory.c | 57 +++-- > mm/memcontrol.c | 407 ++++++++++++++--------------------- > mm/memory.c | 41 ++-- > mm/rmap.c | 19 -- > mm/shmem.c | 24 ++- > mm/swap.c | 34 +++ > mm/swapfile.c | 14 +- > 12 files changed, 314 insertions(+), 380 deletions(-) > [...] > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index eb65d29516ca..1a9a096858e0 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -54,28 +54,11 @@ struct mem_cgroup_reclaim_cookie { > }; > > #ifdef CONFIG_MEMCG > -/* > - * All "charge" functions with gfp_mask should use GFP_KERNEL or > - * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't > - * alloc memory but reclaims memory from all available zones. So, "where I want > - * memory from" bits of gfp_mask has no meaning. So any bits of that field is > - * available but adding a rule is better. charge functions' gfp_mask should > - * be set to GFP_KERNEL or gfp_mask & GFP_RECLAIM_MASK for avoiding ambiguous > - * codes. > - * (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.) > - */ I think we should slightly modify the comment but the primary idea should stay there. What about the following? /* * Although memcg charge functions do not allocate any memory they are * still getting GFP mask to control the reclaim process (therefore * gfp_mask & GFP_RECLAIM_MASK is expected). * GFP_KERNEL should be used for the general charge path without any * constraints for the reclaim * __GFP_WAIT should be cleared for atomic contexts * __GFP_NORETRY should be set for charges which might fail rather than * spend too much time reclaiming * __GFP_NOFAIL should be set for charges which cannot fail. */ > - > -extern int mem_cgroup_charge_anon(struct page *page, struct mm_struct *mm, > - gfp_t gfp_mask); > -/* for swap handling */ > -extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm, > - struct page *page, gfp_t mask, struct mem_cgroup **memcgp); > -extern void mem_cgroup_commit_charge_swapin(struct page *page, > - struct mem_cgroup *memcg); > -extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg); > - > -extern int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm, > - gfp_t gfp_mask); > +int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > + gfp_t gfp_mask, struct mem_cgroup **memcgp); > +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, > + bool lrucare); > +void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); > > struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); > struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); [...] > @@ -948,6 +951,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, > struct page *page, > unsigned long haddr) > { > + struct mem_cgroup *memcg; > spinlock_t *ptl; > pgtable_t pgtable; > pmd_t _pmd; > @@ -968,20 +972,21 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, > __GFP_OTHER_NODE, > vma, address, page_to_nid(page)); > if (unlikely(!pages[i] || > - mem_cgroup_charge_anon(pages[i], mm, > - GFP_KERNEL))) { > + mem_cgroup_try_charge(pages[i], mm, GFP_KERNEL, > + &memcg))) { > if (pages[i]) > put_page(pages[i]); > - mem_cgroup_uncharge_start(); > while (--i >= 0) { > - mem_cgroup_uncharge_page(pages[i]); > + memcg = (void *)page_private(pages[i]); Hmm, OK the memcg couldn't go away even if mm owner has left it because the charge is already there and the page is not on LRU so the mem_cgroup_css_free will wait until we uncharge it or put to LRU. > + set_page_private(pages[i], 0); > + mem_cgroup_cancel_charge(pages[i], memcg); > put_page(pages[i]); > } > - mem_cgroup_uncharge_end(); > kfree(pages); > ret |= VM_FAULT_OOM; > goto out; > } /* * Pages might end up charged to a different memcgs * because the mm owner might move while we are allocating * them. Abuse ->private field to store the charged * memcg until we know whether to commit or cancel the * charge. */ > + set_page_private(pages[i], (unsigned long)memcg); > } > > for (i = 0; i < HPAGE_PMD_NR; i++) { [...] > +/** > + * mem_cgroup_commit_charge - commit a page charge > + * @page: page to charge > + * @memcg: memcg to charge the page to > + * @lrucare: page might be on LRU already > + * > + * Finalize a charge transaction started by mem_cgroup_try_charge(), > + * after page->mapping has been set up. This must happen atomically > + * as part of the page instantiation, i.e. under the page table lock > + * for anonymous pages, under the page lock for page and swap cache. > + * > + * In addition, the page must not be on the LRU during the commit, to > + * prevent racing with task migration. If it might be, use @lrucare. > + * > + * Use mem_cgroup_cancel_charge() to cancel the transaction instead. > + */ > +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, > + bool lrucare) I think we should be explicit that this is only required for LRU pages. kmem doesn't have to finalize the transaction. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>