On Fri, Aug 24, 2018 at 11:07:17PM +0800, Wei Yang wrote: >On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote: >>On 08/23/2018 06:07 AM, Wei Yang wrote: >>> --- a/mm/sparse.c >>> +++ b/mm/sparse.c >>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >>> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >>> struct mem_section *section; >>> >>> - if (mem_section[root]) >>> + if (likely(mem_section[root])) >>> return -EEXIST; >> >>We could add likely()/unlikely() to approximately a billion if()s around >>the kernel if we felt like it. We don't because it's messy and it >>actually takes away choices from the compiler. >> >>Please don't send patches like this unless you have some *actual* >>analysis that shows the benefit of the patch. Performance numbers are best. > Hi, Is my analysis reasonable? Or which part is not valid? >Thanks all for your comments, Michal, Dave and Oscar. > >Well, maybe I took it for granted, so let me put more words on this. To be >honest, my analysis maybe partially effective, so if the cost is higher than >the gain, please let me know. > >Below is my analysis and test result for this patch. >------------------------------------------------------ > >During bootup, the call flow looks like this. > > sparse_memory_present_with_active_regions() > memory_present() > sparse_index_init() > >sparse_memory_present_with_active_regions() iterates on pfn continuously for >the whole system RAM, which leads to sparse_index_init() will iterate >section_nr continuously. Usually, we don't expect many large holes, right? > >Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of >mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of >check, only the first check is false. So the possibility to be false is >(1 / SECTIONS_PER_ROOT). > >SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)). > >On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B. > > SECTIONS_PER_ROOT = 4K / 16 = 256. > >So the check for mem_section[root] is (1 / 256) chance to be invalid and >(255 / 256) valid. In theory, this value seems to be a "likely" to me. > >In practice, when the system RAM is multiple times of >((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is >(255 / 256), otherwise the chance would be less. > >On my x86_64 machine, SECTION_SIZE_BITS is defined to 27. > > ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB > > System RAM size 32G 16G 8G 4G > Possibility (255 / 256) (127 / 128) (63 / 64) (31 / 32) > >Generally, in my mind, if we iterate pfn continuously and there is no large >holes, the check on mem_section[root] is likely to be true. > >At last, here is the test result on my 4G virtual machine. I added printk >before and after sparse_memory_present_with_active_regions() and tested three >times with/without "likely". > > without with > Elapsed 0.000252 0.000250 -0.8% > >The benefit seems to be too small on a 4G virtual machine or even this is not >stable. Not sure we can see some visible effect on a 32G machine. > > >Well, above is all my analysis and test result. I did the optimization based >on my own experience and understanding. If this is not qualified, I am very >glad to hear from your statement, so that I would learn more from your >experience. > >Thanks all for your comments again :-) > > >-- >Wei Yang >Help you, Help me -- Wei Yang Help you, Help me