On 09/03/16 15:17, Li Zhang wrote: > On Tue, Mar 8, 2016 at 10:45 PM, Balbir Singh <bsingharora@xxxxxxxxx> wrote: >> >> On 08/03/16 14:55, Li Zhang wrote: >>> From: Li Zhang <zhlcindy@xxxxxxxxxxxxxxxxxx> >>> >>> Uptream has supported page parallel initialisation for X86 and the >>> boot time is improved greately. Some tests have been done for Power. >>> >>> Here is the result I have done with different memory size. >>> >>> * 4GB memory: >>> boot time is as the following: >>> with patch vs without patch: 10.4s vs 24.5s >>> boot time is improved 57% >>> * 200GB memory: >>> boot time looks the same with and without patches. >>> boot time is about 38s >>> * 32TB memory: >>> boot time looks the same with and without patches >>> boot time is about 160s. >>> The boot time is much shorter than X86 with 24TB memory. >>> From community discussion, it costs about 694s for X86 24T system. >>> >>> From code view, parallel initialisation improve the performance by >>> deferring memory initilisation to kswap with N kthreads, it should >>> improve the performance therotically. >>> >>> From the test result, On X86, performance is improved greatly with huge >>> memory. But on Power platform, it is improved greatly with less than >>> 100GB memory. For huge memory, it is not improved greatly. But it saves >>> the time with several threads at least, as the following information >>> shows(32TB system log): >>> >>> [ 22.648169] node 9 initialised, 16607461 pages in 280ms >>> [ 22.783772] node 3 initialised, 23937243 pages in 410ms >>> [ 22.858877] node 6 initialised, 29179347 pages in 490ms >>> [ 22.863252] node 2 initialised, 29179347 pages in 490ms >>> [ 22.907545] node 0 initialised, 32049614 pages in 540ms >>> [ 22.920891] node 15 initialised, 32212280 pages in 550ms >>> [ 22.923236] node 4 initialised, 32306127 pages in 550ms >>> [ 22.923384] node 12 initialised, 32314319 pages in 550ms >>> [ 22.924754] node 8 initialised, 32314319 pages in 550ms >>> [ 22.940780] node 13 initialised, 33353677 pages in 570ms >>> [ 22.940796] node 11 initialised, 33353677 pages in 570ms >>> [ 22.941700] node 5 initialised, 33353677 pages in 570ms >>> [ 22.941721] node 10 initialised, 33353677 pages in 570ms >>> [ 22.941876] node 7 initialised, 33353677 pages in 570ms >>> [ 22.944946] node 14 initialised, 33353677 pages in 570ms >>> [ 22.946063] node 1 initialised, 33345485 pages in 580ms >>> >>> It saves the time about 550*16 ms at least, although it can be ignore to compare >>> the boot time about 160 seconds. What's more, the boot time is much shorter >>> on Power even without patches than x86 for huge memory machine. >>> >>> So this patchset is still necessary to be enabled for Power. >>> >>> > Hi Balbir, > > Thanks for your reviewing. > >> The patchset looks good, two questions >> >> 1. The patchset is still necessary for >> a. systems with smaller amount of RAM? > I think it is. Currently, I tested systems for 4GB, 50GB, and > boot time is improved. > We may test more systems with different memory size in the future. >> b. Theoretically it improves boot time? > The boot time is improved a little bit for huge memory system > and it can be ignored. > But I think it's still necessary to enable this feature. > >> 2. the pgdat->node_spanned_pages >> 8 sounds arbitrary >> On a system with 2TB*16 nodes, it would initialize about 8GB before calling deferred init? >> Don't we need at-least 32GB + space for other early hash allocations >> BTW, My expectation was that 32TB would imply 32GB+32GB of large hash allocations early on > pgdat->node_spanned_pages >> 8 means that it allocates the size > of the memory on one node. > On a system with 2TB *16nodes, it will allocate 16*8GB = 128GB. > I am not sure if it can be minimised to >> 16 to make sure all > the architectures with different > memory size work well. And this is also mentioned in early > discussion for X86, so I choose >> 8. > > * From the code as the following: > > free_area_init_core -> > memmap_init-> > update_defer_init > #define memmap_init(size, nid, zone, start_pfn) \ > memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) > > memmap_init_zone is based on a zone, but free_area_init_core will > help find the highest > zone on the node. And update_defer_init() get max initialised > memory on highest zone for a node to > reserve for early initialisation. > > static void __paginginit free_area_init_core(struct pglist_data *pgdat) > { > ... > for (j = 0; j < MAX_NR_ZONES; j++) { > .... > memmap_init(size, nid, j, zone_start_fn); //find > the highest zone on a node. > ... > } > } > > * From the dmesg log, after applying this patchset, it has > 123013440K(about 117GB), > which is enough for Dentry node hash table and Inode hash table in > this system. > > [ 0.000000] Memory: 123013440K/31739871232K available (8000K > kernel code, 1856K rwdata, > 3384K rodata, 6208K init, 2544K bss, 28531136K reserved, 0K cma-reserved) > > Thanks :) > Looks good! It seems the real benefit is for smaller systems - thanks for clarifying Please check if CMA is affected in any way Acked-by: Balbir Singh <bsingharora@xxxxxxxxx> Balbir Singh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>