> On May 12, 2020, at 5:58 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote: >>> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: >>> >>> With the page->mapping requirement gone from memcg, we can charge anon >>> and file-thp pages in one single step, right after they're allocated. >>> >>> This removes two out of three API calls - especially the tricky commit >>> step that needed to happen at just the right time between when the >>> page is "set up" and when it's "published" - somewhat vague and fluid >>> concepts that varied by page type. All we need is a freshly allocated >>> page and a memcg context to charge. >>> >>> v2: prevent double charges on pre-allocated hugepages in khugepaged >>> >>> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> >>> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> >>> --- >>> include/linux/mm.h | 4 +--- >>> kernel/events/uprobes.c | 11 +++-------- >>> mm/filemap.c | 2 +- >>> mm/huge_memory.c | 9 +++------ >>> mm/khugepaged.c | 35 ++++++++++------------------------- >>> mm/memory.c | 36 ++++++++++-------------------------- >>> mm/migrate.c | 5 +---- >>> mm/swapfile.c | 6 +----- >>> mm/userfaultfd.c | 5 +---- >>> 9 files changed, 31 insertions(+), 82 deletions(-) >> [] >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> >>> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm, >>> out_up_write: >>> up_write(&mm->mmap_sem); >>> out_nolock: >>> + if (*hpage) >>> + mem_cgroup_uncharge(*hpage); >>> trace_mm_collapse_huge_page(mm, isolated, result); >>> return; >>> out: >>> - mem_cgroup_cancel_charge(new_page, memcg); >>> goto out_up_write; >>> } >> [] >> >> Some memory pressure will crash this new code. It looks like somewhat racy. >> >> if (!page->mem_cgroup) >> >> where page == NULL in mem_cgroup_uncharge(). > > Thanks for the report, sorry about the inconvenience. > > Hm, the page is exclusive at this point, nobody else should be > touching it. After all, khugepaged might reuse the preallocated page > for another pmd if this one fails to collapse. > > Looking at the code, I think it's page itself that's garbage, not > page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation > fails, *hpage could contain an ERR_PTR instead of being NULL. > > I think we need the following fixlet: Yes, I have NUMA here. Stephen, can you pick this up for first before Andrew has a chance to push out the next mmotm hopefully contain this fix? https://lore.kernel.org/lkml/20200512215813.GA487759@xxxxxxxxxxx/ > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index f2e0a5e5cfbb..f6161e17da26 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct *mm, > out_up_write: > up_write(&mm->mmap_sem); > out_nolock: > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > trace_mm_collapse_huge_page(mm, isolated, result); > return; > @@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm, > unlock_page(new_page); > out: > VM_BUG_ON(!list_empty(&pagelist)); > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > /* TODO: tracepoints */ > }