Re: Re: [RFC] mm: support multi_freearea to the reduction of external fragmentation

"lipeifeng@xxxxxxxx" <lipeifeng@xxxxxxxx> · Mon, 26 Apr 2021 18:19:48 +0800

Hi David Hildenbrand：

>> And you don't mention what the baseline configuration was. For example,
>> how was compaction configured?

>> Just to clarify, what is monkey?

>> Monkey HTTP server? MonkeyTest disk benchmark? UI/Application Exerciser
>> Monkey?
-------------------------------------------------------------------------------------
I am sorry that i didn't  give a clear explanation about Monkey.
It meant  "UI/Application Exerciser Monkey" from google.

Excuse me, let me introduce our test:

1. record COMPACT_STALL
We tested the patch on linux-4.4/linux-4.9/linux-4.14/linux-4.19 and the
results shows that the patch is effective in reducing COMPACTSTALL.
    - monkey for 12 hours.
    - record COMPACTSTALL after test.

Test-result: reduced COMPACTSTALL by 95.6% with the patch.
(the machine with 4 gigabytes of physical memery and in linux-4.19.)
---------------------------------
                     |   COMPACTSTALL
---------------------------------
   ori              |     2189
---------------------------------
optimization |      95
---------------------------------

I fully agree with the value of compaction, but compaction also bring cpu
consumption and will increase the time of alloc_stall. So if we can let more
free high-orders-pages in buddy instead of signal pages, it will decrease
COMPACT_STALL and speed up memory allocation.

2. record the speed of the high-orders-pages allocation(order=4 and order = 8)
Before and after optimization, we tested the speed of the high-orders-pages allocation
after 120-hours-Monkey in 10 Android mobile phones. and the result show that
the speed has been increased by more than 18%.

Also, we do some test designed by us: 
(the machine with 4 gigabytes of physical memery and in linux-4.19.)
model the usage of users, and constantly start and
operate the diffrent application for 120h, and we record COMPACT_STALL is decreased by
90+% and speed of the high-orders-pages is increaed by 15+%.

and I have some question, i hope you can guide me if when you are free.
1) What is the compaction configured?
    Dost it meant the members in zone? like as follows:
        unsigned int compact_considered;
        unsigned int compact_defer_shift;
        int compact_order_failed;
        bool compact_blockskip_failed;
    Or the some Macro variable? like as follows:
        PAGE_ALLOC_COSTLY_ORDER = 3
        MIN_COMPACT_PRIORITY = 1
        MAX_COMPACT_RETRIES = 16

>> 1) multi freearea (which might
>> be problematic with sparcity)
2) Can you pls tell me what is soarcity and what is the impact of this?
    and whether there are some documents about it?

IIRC, there are plenty. One example is will-it-scale.

Have a look at https://apc01.safelinks.protection.outlook.com/?url=""> 

Thanks you indeed, we will do the test and see if there's any revenue.

lipeifeng@xxxxxxxx
 From: David Hildenbrand
Date: 2021-04-26 16:37
To: lipeifeng@xxxxxxxx; Vlastimil Babka; peifengl55; schwidefsky; heiko.carstens; zhangshiming; zhouhuacai; guoweichao; guojian
CC: linux-s390; linux-kernel; linux-mm
Subject: Re: [RFC] mm: support multi_freearea to the reduction of external fragmentation
On 26.04.21 05:19, lipeifeng@xxxxxxxx wrote:
> 
>  >> Let's consider part 3 only and ignore the 1) multi freearea (which might
>  >> be problematic with sparcity) and 2) the modified allocation scheme
>  >> (which doesn't yet quite sense to me yet, e.g., because we group by
>  >> mobility and have compaction in place; I assume this really only helps
>  >> in some special cases -- like the test case you are giving; I might be
>  >> wrong)
>  >> Right now, we decide whether to but to head or tail based on how likely
>  >> it is that we might merge to a higher-order page (buddy_merge_likely())
>  >> in the future. So we only consider the current "neighborhood" of the
>  >> page we're freeing. As we restrict our neighborhood to MAX_ORDER - 1
>  >> pages (what we can actually merge). Of course, we can easily be wrong
>  >> here. Grouping by movability and compaction only helps to some degree I
>  >> guess.
>  >> AFAIK, what you propose is basing the decisions where to place a page
>  >> (in addition?) on a median_pfn. Without 1) and 2) I cannot completely
>  >> understand if 3) itself would help at all (and how to set the
>  >> median_pfn). But it would certainly be interesting if we can tweak the
>  >> current logic to better identify merge targets simply by tweaking
>  >> buddy_merge_likely() or the assumptions it is making.
> 
> 
> 
> Hi David Hildenbrand，Vlastimil Babka:
>      Thank you very much indeed for advices.
> 
>>> 2) the modified allocation scheme
>  >> (which doesn't yet quite sense to me yet, e.g., because we group by
>  >> mobility and have compaction in place; I assume this really only helps
>  >> in some special cases -- like the test case you are giving;
>   ---------------------------------------------------------------------------------
> 1) Divide memory into several segments by pages-PFN
> 2) Select the corresponding freearea to alloc-pages
>      These two parts art for the same purpose:
> low-order-pages allocation will be concentrated in the front area of 
> physical memory
> so that few memory-pollution in the back area of memory, the sussessful 
> probablity
> of high-order allocation would be improved.
> 
>      I think that it would help in almost all cases of high-oder-pages 
> allocation, instead
>      of special case, because it can let more high-order free-pages in 
> buddy, example:

See, and I am not convinced that this is the case, because you really 
only report one example (Monkey) and I have to assume it is a special 
case then.

> 
>   * when user alloc 64K bytes, if the unit is page(4K bytes) and it
>     needs to 16 times. 
> 
> if the unit is 64Kbytes, it only takes once.
> 
>   * if there are more free-high-order-pages in buddy that few
>     compact-stall in
> 
> alloction-process, the allocstall-time would be shortened.
> 
>      We tested the speed of the high-orders-pages(order=4 and order = 8) 
> allocation
> after monkey and found that it increased by more than 18%.
> 

And you don't mention what the baseline configuration was. For example, 
how was compaction configured?

Just to clarify, what is monkey?

Monkey HTTP server? MonkeyTest disk benchmark? UI/Application Exerciser 
Monkey?

> 3) Adjust the location of free-pages in the free_list
>>>Without 1) and 2) I cannot completely
>  >>understand if 3) itself would help at all (and how to set the median_pfn)
> -----------------------------------------------------------------------------------------------------
>      Median_pfn is set by the range of pages-PFN of free_area. if part 
> 3) would be tried separately
>      without 1) and 2), the simple setting is the median of the entire 
> memory. But i think it will play the
> better role in optimization based on the 1) and 2).
> 
> 
> 
>  >> Last but not least, there have to be more benchmarks and test cases that
>  >> proof that other workload won't be degraded to a degree that people
>  >> care; as one example, this includes runtime overhead when
>>> allocating/freeing pages.
> ---------------------------------------------
> 1. For modification of buddy: the modified allocation scheme 1)+2)
>      Is thers any standard detailed test-list  of the modified 
> allocation in the community? like benchmarks
> or any other tests? if  i pass the test required by communiry that can 
> proof the patch will not degraded
> to a degree that people care and can merge it in the baseline?

IIRC, there are plenty. One example is will-it-scale.

Have a look at https://apc01.safelinks.protection.outlook.com/?url="">

-- 
Thanks,

David / dhildenb