Re: [PATCH 1/3] mm/sparse: add likely to mem_section[root] check in sparse_index_init()

Wei Yang <richard.weiyang@xxxxxxxxx> · Fri, 24 Aug 2018 23:07:17 +0800

On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote:
>On 08/23/2018 06:07 AM, Wei Yang wrote:
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>>  	struct mem_section *section;
>>  
>> -	if (mem_section[root])
>> +	if (likely(mem_section[root]))
>>  		return -EEXIST;
>
>We could add likely()/unlikely() to approximately a billion if()s around
>the kernel if we felt like it.  We don't because it's messy and it
>actually takes away choices from the compiler.
>
>Please don't send patches like this unless you have some *actual*
>analysis that shows the benefit of the patch.  Performance numbers are best.

Thanks all for your comments, Michal, Dave and Oscar.

Well, maybe I took it for granted, so let me put more words on this. To be
honest, my analysis maybe partially effective, so if the cost is higher than
the gain, please let me know.

Below is my analysis and test result for this patch.
------------------------------------------------------

During bootup, the call flow looks like this.

    sparse_memory_present_with_active_regions()
        memory_present()
            sparse_index_init()

sparse_memory_present_with_active_regions() iterates on pfn continuously for
the whole system RAM, which leads to sparse_index_init() will iterate
section_nr continuously. Usually, we don't expect many large holes, right?

Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of
mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of
check, only the first check is false. So the possibility to be false is 
(1 / SECTIONS_PER_ROOT).

SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)).

On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B.

    SECTIONS_PER_ROOT = 4K / 16 = 256.

So the check for mem_section[root] is (1 / 256) chance to be invalid and
(255 / 256) valid. In theory, this value seems to be a "likely" to me.

In practice, when the system RAM is multiple times of
((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is
(255 / 256), otherwise the chance would be less. 

On my x86_64 machine, SECTION_SIZE_BITS is defined to 27.

    ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB

          System RAM size       32G         16G        8G         4G
      Possibility          (255 / 256) (127 / 128) (63 / 64)  (31 / 32)

Generally, in my mind, if we iterate pfn continuously and there is no large
holes, the check on mem_section[root] is likely to be true.

At last, here is the test result on my 4G virtual machine. I added printk
before and after sparse_memory_present_with_active_regions() and tested three
times with/without "likely".

                without      with
     Elapsed   0.000252     0.000250   -0.8%

The benefit seems to be too small on a 4G virtual machine or even this is not
stable. Not sure we can see some visible effect on a 32G machine.

Well, above is all my analysis and test result. I did the optimization based
on my own experience and understanding. If this is not qualified, I am very
glad to hear from your statement, so that I would learn more from your
experience.

Thanks all for your comments again :-)

-- 
Wei Yang
Help you, Help me