Dear Linux folks, Am 21.08.2018 um 11:37 schrieb Paul Menzel: > [Removed non-working Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>] > On 08/17/18 10:12, Paul Menzel wrote: > >> With the merge of branch 'x86-timers-for-linus' >> of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip early time >> stamps are now printed. >> >>> Early TSC based time stamping to allow better boot time analysis. >>> This comes with a general cleanup of the TSC calibration code which >>> grew warts and duct taping over the years and removes 250 lines of >>> code. Initiated and mostly implemented by Pavel with help from various >>> folks >> >> Looking at those early time stamps, in this case on an ASRock E350M1, >> there is a 160 ms delay in the code below. >> >> Before: >> >> ``` >> [ 0.000000] Initializing CPU#0 >> [ 0.000000] Initializing HighMem for node 0 (000373fe:000c7d3c) >> [ 0.000000] Initializing Movable for node 0 (00000000:00000000) >> [ 0.000000] Memory: 3225668K/3273580K available (8898K kernel code, 747K rwdata, 2808K rodata, 768K init, 628K bss, 47912K reserved, 0K cma-reserved, 2368760K highmem) >> ``` >> >> After: >> >> ``` >> [ 0.063473] Initializing CPU#0 >> [ 0.063484] Initializing HighMem for node 0 (00036ffe:000c7d3c) >> [ 0.229442] Initializing Movable for node 0 (00000000:00000000) >> [ 0.236020] Memory: 3225728K/3273580K available (8966K kernel code, 750K rwdata, 2828K rodata, 776K init, 640K bss, 47852K reserved, 0K cma-reserved, 2372856K highmem) >> ``` >> >> The code in question is from `arch/x86/mm/highmem_32.c`. >> >>> void __init set_highmem_pages_init(void) >>> { >>> struct zone *zone; >>> int nid; >>> >>> /* >>> * Explicitly reset zone->managed_pages because set_highmem_pages_init() >>> * is invoked before free_all_bootmem() >>> */ >>> reset_all_zones_managed_pages(); >>> for_each_zone(zone) { >>> unsigned long zone_start_pfn, zone_end_pfn; >>> >>> if (!is_highmem(zone)) >>> continue; >>> >>> zone_start_pfn = zone->zone_start_pfn; >>> zone_end_pfn = zone_start_pfn + zone->spanned_pages; >>> >>> nid = zone_to_nid(zone); >>> printk(KERN_INFO "Initializing %s for node %d (%08lx:%08lx)\n", >>> zone->name, nid, zone_start_pfn, zone_end_pfn); >>> >>> add_highpages_with_active_regions(nid, zone_start_pfn, >>> zone_end_pfn); >>> } >>> } >> >> And in there, it seems to be the function below. >> >>> void __init add_highpages_with_active_regions(int nid, >>> unsigned long start_pfn, unsigned long end_pfn) >>> { >>> phys_addr_t start, end; >>> u64 i; >>> >>> for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &start, &end, NULL) { >>> unsigned long pfn = clamp_t(unsigned long, PFN_UP(start), >>> start_pfn, end_pfn); >>> unsigned long e_pfn = clamp_t(unsigned long, PFN_DOWN(end), >>> start_pfn, end_pfn); >>> for ( ; pfn < e_pfn; pfn++) >>> if (pfn_valid(pfn)) >>> free_highmem_page(pfn_to_page(pfn)); >> >> Assuming the time stamps are correct, how can a profile that delay >> without adding print statements all over the place? Using ftrace >> doesn’t seem to work for me, probably because it’s that early. > > I added print statements, and got the result below. […] > So, the problem is with i = 2. Here is an even more verbose debug log. ``` [ 0.055628] Initializing CPU#0 [ 0.055638] Initializing HighMem for node 0 (00036ffe:000c7d3c) [ 0.055641] add_highpages_with_active_regions: start_pfn = 225278, end_pfn = 818492 [ 0.055645] add_highpages_with_active_regions: for_each_free_mem_range: i = 8589934592, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055646] add_highpages_with_active_regions: after for loop [ 0.055649] add_highpages_with_active_regions: for_each_free_mem_range: i = 12884901889, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055650] add_highpages_with_active_regions: after for loop [ 0.055653] add_highpages_with_active_regions: for_each_free_mem_range: i = 17179869185, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055654] add_highpages_with_active_regions: after for loop [ 0.055656] add_highpages_with_active_regions: for_each_free_mem_range: i = 21474836481, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055658] add_highpages_with_active_regions: after for loop [ 0.055660] add_highpages_with_active_regions: for_each_free_mem_range: i = 25769803777, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055661] add_highpages_with_active_regions: after for loop [ 0.055664] add_highpages_with_active_regions: for_each_free_mem_range: i = 30064771073, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055665] add_highpages_with_active_regions: after for loop [ 0.055668] add_highpages_with_active_regions: for_each_free_mem_range: i = 34359738369, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055669] add_highpages_with_active_regions: after for loop [ 0.055671] add_highpages_with_active_regions: for_each_free_mem_range: i = 38654705665, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055672] add_highpages_with_active_regions: after for loop [ 0.055675] add_highpages_with_active_regions: for_each_free_mem_range: i = 42949672961, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055676] add_highpages_with_active_regions: after for loop [ 0.055678] add_highpages_with_active_regions: for_each_free_mem_range: i = 47244640257, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055679] add_highpages_with_active_regions: after for loop [ 0.055682] add_highpages_with_active_regions: for_each_free_mem_range: i = 51539607553, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055683] add_highpages_with_active_regions: after for loop [ 0.055685] add_highpages_with_active_regions: for_each_free_mem_range: i = 55834574849, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055686] add_highpages_with_active_regions: after for loop [ 0.055688] add_highpages_with_active_regions: for_each_free_mem_range: i = 60129542145, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055690] add_highpages_with_active_regions: after for loop [ 0.055692] add_highpages_with_active_regions: for_each_free_mem_range: i = 64424509441, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055693] add_highpages_with_active_regions: after for loop [ 0.055695] add_highpages_with_active_regions: for_each_free_mem_range: i = 68719476737, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055697] add_highpages_with_active_regions: after for loop [ 0.055699] add_highpages_with_active_regions: for_each_free_mem_range: i = 73014444033, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055700] add_highpages_with_active_regions: after for loop [ 0.055702] add_highpages_with_active_regions: for_each_free_mem_range: i = 77309411329, nid = 0, pfn = 225278, e_pfn = 225278, before for loop [ 0.055704] add_highpages_with_active_regions: after for loop [ 0.055706] add_highpages_with_active_regions: for_each_free_mem_range: i = 77309411330, nid = 0, pfn = 225278, e_pfn = 818492, before for loop [ 0.222395] add_highpages_with_active_regions: after for loop [ 0.222398] Initializing Movable for node 0 (00000000:00000000) [ 0.222401] add_highpages_with_active_regions: start_pfn = 0, end_pfn = 0 [ 0.222404] add_highpages_with_active_regions: for_each_free_mem_range: i = 8589934592, nid = 0, pfn = 0, e_pfn = 0, before for loop [ 0.222405] add_highpages_with_active_regions: after for loop [ 0.222408] add_highpages_with_active_regions: for_each_free_mem_range: i = 12884901889, nid = 0, pfn = 0, e_pfn = 0, before for loop [ 0.222409] add_highpages_with_active_regions: after for loop ``` So “free’ing” pfn = 225278 to e_pfn = 818492 in the for loop takes 160 ms. Is there a way to get rid of the `pfn_valid(pfn)`? Kind regards, Paul