Re: [RFC 0/6] Reclaim zero subpages of thp to avoid memory bloat

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Thu, 28 Oct 2021 17:13:33 +0300

On Thu, Oct 28, 2021 at 07:56:49PM +0800, Ning Zhang wrote:
> As we know, thp may lead to memory bloat which may cause OOM.
> Through testing with some apps, we found that the reason of
> memory bloat is a huge page may contain some zero subpages
> (may accessed or not). And we found that most zero subpages
> are centralized in a few huge pages.
> 
> Following is a text_classification_rnn case for tensorflow:
> 
>   zero_subpages   huge_pages  waste
>   [     0,     1) 186         0.00%
>   [     1,     2) 23          0.01%
>   [     2,     4) 36          0.02%
>   [     4,     8) 67          0.08%
>   [     8,    16) 80          0.23%
>   [    16,    32) 109         0.61%
>   [    32,    64) 44          0.49%
>   [    64,   128) 12          0.30%
>   [   128,   256) 28          1.54%
>   [   256,   513) 159        18.03%
> 
> In the case, there are 187 huge pages (25% of the total huge pages)
> which contain more then 128 zero subpages. And these huge pages
> lead to 19.57% waste of the total rss. It means we can reclaim
> 19.57% memory by splitting the 187 huge pages and reclaiming the
> zero subpages.
> 
> This patchset introduce a new mechanism to split the huge page
> which has zero subpages and reclaim these zero subpages.
> 
> We add the anonymous huge page to a list to reduce the cost of
> finding the huge page. When the memory reclaim is triggering,
> the list will be walked and the huge page contains enough zero
> subpages may be reclaimed. Meanwhile, replace the zero subpages
> by ZERO_PAGE(0). 

Does it actually help your workload?

I mean this will only be triggered via vmscan that was going to split
pages and free anyway.

You prioritize splitting THP and freeing zero subpages over reclaiming
other pages. It may or may not be right thing to do, depending on
workload.

Maybe it makes more sense to check for all-zero pages just after
split_huge_page_to_list() in vmscan and free such pages immediately rather
then add all this complexity?

> Yu Zhao has done some similar work when the huge page is swap out
> or migrated to accelerate[1]. While we do this in the normal memory
> shrink path for the swapoff scene to avoid OOM.
> 
> In the future, we will do the proactive reclaim to reclaim the "cold"
> huge page proactively. This is for keeping the performance of thp as
> for as possible. In addition to that, some users want the memory usage
> using thp is equal to the usage using 4K.

Proactive reclaim can be harmful if your max_ptes_none allows to recreate
THP back.

-- 
 Kirill A. Shutemov