On Tue, Oct 21, 2014 at 04:52:52PM +0400, Vladimir Davydov wrote: > On Mon, Oct 20, 2014 at 11:22:09AM -0400, Johannes Weiner wrote: > > mem_cgroup_swapout() is called with exclusive access to the page at > > the end of the page's lifetime. Instead of clearing the PCG_MEMSW > > flag and deferring the uncharge, just do it right away. This allows > > follow-up patches to simplify the uncharge code. > > > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > --- > > mm/memcontrol.c | 17 +++++++++++++---- > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index bea3fddb3372..7709f17347f3 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5799,6 +5799,7 @@ static void __init enable_swap_cgroup(void) > > */ > > void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > > { > > + struct mem_cgroup *memcg; > > struct page_cgroup *pc; > > unsigned short oldid; > > > > @@ -5815,13 +5816,21 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > > return; > > > > VM_BUG_ON_PAGE(!(pc->flags & PCG_MEMSW), page); > > + memcg = pc->mem_cgroup; > > > > - oldid = swap_cgroup_record(entry, mem_cgroup_id(pc->mem_cgroup)); > > + oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg)); > > VM_BUG_ON_PAGE(oldid, page); > > + mem_cgroup_swap_statistics(memcg, true); > > > > - pc->flags &= ~PCG_MEMSW; > > - css_get(&pc->mem_cgroup->css); > > - mem_cgroup_swap_statistics(pc->mem_cgroup, true); > > + pc->flags = 0; > > + > > + if (!mem_cgroup_is_root(memcg)) > > + page_counter_uncharge(&memcg->memory, 1); > > AFAIU it removes batched uncharge of swapped out pages, doesn't it? Will > it affect performance? During swapout and with lockless page counters? I don't think so. > Besides, it looks asymmetric with respect to the page cache uncharge > path, where we still defer uncharge to mem_cgroup_uncharge_list(), and I > personally rather dislike this asymmetry. The asymmetry is inherent in the fact that we mave memory and memory+swap accounting, and here a memory charge is transferred out to swap. Before, the asymmetry was in mem_cgroup_uncharge_list() where we separate out memory and memsw pages (which the next patch fixes). So nothing changed, the ugliness was just moved around. I actually like it better now that it's part of the swap controller, because that's where the nastiness actually comes from. This will all go away when we account swap separately. Then, swapped pages can keep their memory charge until mem_cgroup_uncharge() again and the swap charge will be completely independent from it. This reshuffling is just necessary because it allows us to get rid of the per-page flag. > > + local_irq_disable(); > > + mem_cgroup_charge_statistics(memcg, page, -1); > > + memcg_check_events(memcg, page); > > + local_irq_enable(); > > AFAICT mem_cgroup_swapout() is called under mapping->tree_lock with irqs > disabled, so we should use irq_save/restore here. Good catch! I don't think this function actually needs to be called under the tree_lock, so I'd rather send a follow-up that moves it out. For now, this should be sufficient: --- >From 3a40bd3b85a70db104ade873007dbb84b5117993 Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@xxxxxxxxxxx> Date: Tue, 21 Oct 2014 16:53:14 -0400 Subject: [patch] mm: memcontrol: uncharge pages on swapout fix Vladimir notes: > > + local_irq_disable(); > > + mem_cgroup_charge_statistics(memcg, page, -1); > > + memcg_check_events(memcg, page); > > + local_irq_enable(); > > AFAICT mem_cgroup_swapout() is called under mapping->tree_lock with irqs > disabled, so we should use irq_save/restore here. But this function doesn't actually need to be called under the tree lock. So for now, simply remove the irq-disabling altogether and rely on the caller's IRQ state. Later on, we'll move it out from there and add back the simple, non-saving IRQ-disabling. Reported-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> --- mm/memcontrol.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8dc46aa9ae8f..c688fb73ff35 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5806,6 +5806,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) VM_BUG_ON_PAGE(PageLRU(page), page); VM_BUG_ON_PAGE(page_count(page), page); + /* XXX: caller holds IRQ-safe mapping->tree_lock */ + VM_BUG_ON(!irqs_disabled()); + if (!do_swap_account) return; @@ -5827,10 +5830,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) if (!mem_cgroup_is_root(memcg)) page_counter_uncharge(&memcg->memory, 1); - local_irq_disable(); mem_cgroup_charge_statistics(memcg, page, -1); memcg_check_events(memcg, page); - local_irq_enable(); } /** -- 2.1.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>