Michal Hocko <mhocko@xxxxxxxxxx> writes: > On Thu 01-03-18 14:28:44, Aaron Lu wrote: >> When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy, >> the zone->lock is held and then pages are chosen from PCP's migratetype >> list. While there is actually no need to do this 'choose part' under >> lock since it's PCP pages, the only CPU that can touch them is us and >> irq is also disabled. >> >> Moving this part outside could reduce lock held time and improve >> performance. Test with will-it-scale/page_fault1 full load: >> >> kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) >> v4.16-rc2+ 9034215 7971818 13667135 15677465 >> this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% >> >> What the test does is: starts $nr_cpu processes and each will repeatedly >> do the following for 5 minutes: >> 1 mmap 128M anonymouse space; >> 2 write access to that space; >> 3 munmap. >> The score is the aggregated iteration. > > Iteration count I assume. I am still quite surprised that this would > have such a large impact. The test is run with full load, this means near or more than 100 processes will allocate memory in parallel. According to Amdahl's law, the performance of a parallel program will be dominated by the serial part. For this case, the part protected by zone->lock. So small changes to code under zone->lock could make bigger changes to overall score. Best Regards, Huang, Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>