On Sat 30-10-21 00:12:53, ning zhang wrote: > > 在 2021/10/29 下午9:38, Michal Hocko 写道: > > On Thu 28-10-21 19:56:49, Ning Zhang wrote: > > > As we know, thp may lead to memory bloat which may cause OOM. > > > Through testing with some apps, we found that the reason of > > > memory bloat is a huge page may contain some zero subpages > > > (may accessed or not). And we found that most zero subpages > > > are centralized in a few huge pages. > > > > > > Following is a text_classification_rnn case for tensorflow: > > > > > > zero_subpages huge_pages waste > > > [ 0, 1) 186 0.00% > > > [ 1, 2) 23 0.01% > > > [ 2, 4) 36 0.02% > > > [ 4, 8) 67 0.08% > > > [ 8, 16) 80 0.23% > > > [ 16, 32) 109 0.61% > > > [ 32, 64) 44 0.49% > > > [ 64, 128) 12 0.30% > > > [ 128, 256) 28 1.54% > > > [ 256, 513) 159 18.03% > > > > > > In the case, there are 187 huge pages (25% of the total huge pages) > > > which contain more then 128 zero subpages. And these huge pages > > > lead to 19.57% waste of the total rss. It means we can reclaim > > > 19.57% memory by splitting the 187 huge pages and reclaiming the > > > zero subpages. > > What is the THP policy configuration in your testing? I assume you are > > using defaults right? That would be always for THP and madvise for > > defrag. Would it make more sense to use madvise mode for THP for your > > workload? The THP code is rather complex and just by looking at the > > diffstat this add quite a lot on top. Is this really worth it? > > The THP configuration is always. > > Madvise needs users to set MADV_HUGEPAGE by themselves if they want use huge > page, while many users don't do set this, and they can't control this well. What do you mean tey can't control this well? > Such as java, users can set heap and metaspace to use huge pages with > madvise, but there is also memory bloat. Users still need to test whether > their app can accept the waste. There will always be some internal fragmentation when huge pages are used. The amount will depend on how well the memory is used but huge pages give a performance boost in return. If the memory bloat is a significant problem then overeager THP usage is certainly not good and I would argue that applying THP always policy is not a proper configuration. No matter how much the MM code can try to fix up the situation it will be always a catch up game. > For the case above, if we set THP configuration to be madvise, all the pages > it uses will be 4K-page. > > Memory bloat is one of the most important reasons that users disable THP. > We do this to popularize THP to be default enabled. To my knowledge the most popular reason to disable THP is the runtime overhead. A large part of that overhead has been reduced by not doing heavy compaction during the page fault allocations by default. Memory overhead is certainly an important aspect as well but there is always a possibility to reduce that by reducing it to madvised regions for page fault (i.e. those where author of the code has considered the costs vs. benefits of the huge page) and setting up a conservative khugepaged policy. So there are existing tools available. You are trying to add quite a lot of code so you should have good arguments to add more complexity. I am not sure that popularizing THP is a strong one TBH. -- Michal Hocko SUSE Labs