On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote: >On 08/23/2018 06:07 AM, Wei Yang wrote: >> --- a/mm/sparse.c >> +++ b/mm/sparse.c >> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >> struct mem_section *section; >> >> - if (mem_section[root]) >> + if (likely(mem_section[root])) >> return -EEXIST; > >We could add likely()/unlikely() to approximately a billion if()s around >the kernel if we felt like it. We don't because it's messy and it >actually takes away choices from the compiler. > >Please don't send patches like this unless you have some *actual* >analysis that shows the benefit of the patch. Performance numbers are best. Thanks all for your comments, Michal, Dave and Oscar. Well, maybe I took it for granted, so let me put more words on this. To be honest, my analysis maybe partially effective, so if the cost is higher than the gain, please let me know. Below is my analysis and test result for this patch. ------------------------------------------------------ During bootup, the call flow looks like this. sparse_memory_present_with_active_regions() memory_present() sparse_index_init() sparse_memory_present_with_active_regions() iterates on pfn continuously for the whole system RAM, which leads to sparse_index_init() will iterate section_nr continuously. Usually, we don't expect many large holes, right? Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of check, only the first check is false. So the possibility to be false is (1 / SECTIONS_PER_ROOT). SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)). On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B. SECTIONS_PER_ROOT = 4K / 16 = 256. So the check for mem_section[root] is (1 / 256) chance to be invalid and (255 / 256) valid. In theory, this value seems to be a "likely" to me. In practice, when the system RAM is multiple times of ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is (255 / 256), otherwise the chance would be less. On my x86_64 machine, SECTION_SIZE_BITS is defined to 27. ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB System RAM size 32G 16G 8G 4G Possibility (255 / 256) (127 / 128) (63 / 64) (31 / 32) Generally, in my mind, if we iterate pfn continuously and there is no large holes, the check on mem_section[root] is likely to be true. At last, here is the test result on my 4G virtual machine. I added printk before and after sparse_memory_present_with_active_regions() and tested three times with/without "likely". without with Elapsed 0.000252 0.000250 -0.8% The benefit seems to be too small on a 4G virtual machine or even this is not stable. Not sure we can see some visible effect on a 32G machine. Well, above is all my analysis and test result. I did the optimization based on my own experience and understanding. If this is not qualified, I am very glad to hear from your statement, so that I would learn more from your experience. Thanks all for your comments again :-) -- Wei Yang Help you, Help me