On 4/22/2021 12:31 PM, Florian Fainelli wrote: >> For >> >> https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx >> >> to do its work you'll have to pass __GFP_NORETRY to >> alloc_contig_range(). This requires CMA adaptions, from where we call >> alloc_contig_range(). > > Yes, I did modify the alloc_contig_range() caller to pass GFP_KERNEL | > __GFP_NORETRY. I did run for a more iterations (1000) and the results > are not very conclusive as with __GFP_NORETRY the allocation time per > allocation was not significantly better, in fact it was slightly worse > by 100us than without. > > My x86 VM with 1GB of DRAM including 512MB being in ZONE_MOVABLE does > shows identical numbers for both 4.9 and 5.4 so this must be something > specific to ARM64 and/or the code we added to create a ZONE_MOVABLE on > that architecture since movablecore does not appear to have any effect > unlike x86. We tracked down the slowdowns to be caused by two major contributors: - for a reason that we do not fully understand yet the same cpufreq governor (conservative) did not cause alloc_contig_range() to be slowed down on 4.9 as much as it it with 5.4, running tests with the performance cpufreq governor works a tad better and the results are more consistent from run to run with a smaller variation. - another large contributor to the slowdown was having enabled CONFIG_IRQSOFF_TRACER. After c3bc8fd637a9623f5c507bd18f9677effbddf584 ("tracing: Centralize preemptirq tracepoints and unify their usage") we now prepare arguments for tracing even if we end-up not using them since tracing is not enabled at runtime. Getting the caller function's return address is cheap on arm64 for level == 0, but getting the preceding caller involves doing a backtrace walk which is expensive (see arch/arm64/kernel/return_address.c). So with these two variables eliminated we are only about x2 slower on 5.4 than we were on 4.9 and this is acceptable for our use case. I would not say the case is closed but at least we understand it better. We now have 5.10 brought up to speed so any new investigation will be focused on that kernel. Thanks a lot for your help David! -- Florian