[Cc Ying Huang] On Fri 03-04-20 13:48:14, Prathu Baronia wrote: > THP allocation for anon memory requires zeroing of the huge page. To do so, > we iterate over 2MB memory in 4KB chunks. Each iteration calls for kmap_atomic() > and kunmap_atomic(). This routine makes sense where we need temporary mapping of > the user page. In !HIGHMEM cases, specially in 64-bit architectures, we don't > need temp mapping. Hence, kmap_atomic() acts as nothing more than multiple > barrier() calls. > > This called for optimization. Simply getting VADDR from page does the job for > us. So, implement another (optimized) routine for clear_huge_page() which > doesn't need temporary mapping of user space page. > > While testing this patch on Qualcomm SM8150 SoC (kernel v4.14.117), we see 64% > Improvement in clear_huge_page(). This is an old kernel. Do you see the same with the current upstream kernel? Btw. 60% improvement only from dropping barrier sounds unexpected to me. Are you sure this is the only reason? c79b57e462b5 ("mm: hugetlb: clear target sub-page last when clearing huge page") is already 4.14 AFAICS, is it possible that this is the effect of this patch? Your patch is effectively disabling this optimization for most workloads that really care about it. I strongly doubt that hugetlb is a thing on 32b kernels these days. So this really begs for more data about the real underlying problem IMHO. > Ftrace results: > > Default profile: > ------------------------------------------ > 6) ! 473.802 us | clear_huge_page(); > ------------------------------------------ > > With this patch applied: > ------------------------------------------ > 5) ! 170.156 us | clear_huge_page(); > ------------------------------------------ > > Signed-off-by: Prathu Baronia <prathu.baronia@xxxxxxxxxxx> > Reported-by: Chintan Pandya <chintan.pandya@xxxxxxxxxxx> > --- > mm/memory.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 3ee073d..3e120e8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5119,6 +5119,7 @@ EXPORT_SYMBOL(__might_fault); > #endif > > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) > +#ifdef CONFIG_HIGHMEM > static void clear_gigantic_page(struct page *page, > unsigned long addr, > unsigned int pages_per_huge_page) > @@ -5183,6 +5184,16 @@ void clear_huge_page(struct page *page, > addr + right_idx * PAGE_SIZE); > } > } > +#else > +void clear_huge_page(struct page *page, > + unsigned long addr_hint, unsigned int pages_per_huge_page) > +{ > + void *addr; > + > + addr = page_address(page); > + memset(addr, 0, pages_per_huge_page*PAGE_SIZE); > +} > +#endif > > static void copy_user_gigantic_page(struct page *dst, struct page *src, > unsigned long addr, > -- > 2.7.4 > > > -- > Prathu Baronia -- Michal Hocko SUSE Labs