Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user

Michal Hocko <mhocko@xxxxxxxx> · Wed, 15 Apr 2020 13:09:44 +0200

On Wed 15-04-20 11:40:42, Huang, Ying wrote:
> Alexander Duyck <alexander.duyck@xxxxxxxxx> writes:
[...]
> > If you take a look at commit c6ddfb6c58903 ("mm, clear_huge_page: move
> > order algorithm into a separate function") they were running the tests
> > on multiple threads simultaneously as their concern was flooding the
> > LLC cache. I wonder if we couldn't look at bypassing the cache
> > entirely using something like __copy_user_nocache for some portion of
> > the copy and then only copy in the last pieces that we think will be
> > immediately accessed.
> 
> The problem is how to determine the size of the pieces that will be
> immediately accessed?

Well, this really depends. If you are in a page fault path then it
should be quite obvious that at least the faulting subpage will be
accessed. It is hard to make any assumptions beyond that. THP might
behave very different from hugetlb pages because the former is an
optimistic optimization and the rest of the page might not be used
immediately. Hugetlb pages, on the other hand, are more likely to use
a larger part of the page because then there is a clear memory loss.
Then you have MAP_POPULATE or alike and then optimizing for future
access sounds like a pointless exercise because this is essentially a
stream initialization without any clue which memory will be used
shortly.

All that being said I am not against optimizing clear_huge_page but
please stick to some real life usecases which actually benefit from
the optimization. If there are any arch specific nuances then make them
arch specific. Focusing on microbenchmarks is just leading to a complex
code which might turn out suboptimal in some cases.
-- 
Michal Hocko
SUSE Labs