On Wed 15-04-20 11:40:42, Huang, Ying wrote: > Alexander Duyck <alexander.duyck@xxxxxxxxx> writes: [...] > > If you take a look at commit c6ddfb6c58903 ("mm, clear_huge_page: move > > order algorithm into a separate function") they were running the tests > > on multiple threads simultaneously as their concern was flooding the > > LLC cache. I wonder if we couldn't look at bypassing the cache > > entirely using something like __copy_user_nocache for some portion of > > the copy and then only copy in the last pieces that we think will be > > immediately accessed. > > The problem is how to determine the size of the pieces that will be > immediately accessed? Well, this really depends. If you are in a page fault path then it should be quite obvious that at least the faulting subpage will be accessed. It is hard to make any assumptions beyond that. THP might behave very different from hugetlb pages because the former is an optimistic optimization and the rest of the page might not be used immediately. Hugetlb pages, on the other hand, are more likely to use a larger part of the page because then there is a clear memory loss. Then you have MAP_POPULATE or alike and then optimizing for future access sounds like a pointless exercise because this is essentially a stream initialization without any clue which memory will be used shortly. All that being said I am not against optimizing clear_huge_page but please stick to some real life usecases which actually benefit from the optimization. If there are any arch specific nuances then make them arch specific. Focusing on microbenchmarks is just leading to a complex code which might turn out suboptimal in some cases. -- Michal Hocko SUSE Labs