On 22.04.21 19:50, Florian Fainelli wrote:
On 4/22/2021 1:56 AM, David Hildenbrand wrote:
On 22.04.21 09:49, Michal Hocko wrote:
Cc David and Oscar who are familiar with this code as well.
On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
Hi all,
I have been trying for the past few days to identify the source of a
performance regression that we are seeing with the 5.4 kernel but not
with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
challenging at the moment but will happen eventually.
What we are seeing is a ~3x increase in the time needed for
alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
is idle at the time and there are no other contenders for memory other
than the user-space programs already started (DHCP client, shell, etc.).
Hi,
If you can easily reproduce it might be worth to just try bisecting;
that could be faster than manually poking around in the code.
Also, it would be worth having a look at the state of upstream Linux.
Upstream Linux developers tend to not care about minor performance
regressions on oldish kernels.
This is a big pain point here and I cannot agree more, but until we
bridge that gap, this is not exactly easy to do for me unfortunately and
neither is bisection :/
There has been work on improving exactly the situation you are
describing -- a "fail fast" / "no retry" mode for alloc_contig_range().
Maybe it tackles exactly this issue.
https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx
Minchan is already on cc.
This patch does not appear to be helping, in fact, I had locally applied
this patch from way back when:
https://lkml.org/lkml/2014/5/28/113
which would effectively do this unconditionally. Let me see if I can
showcase this problem a x86 virtual machine operating in similar
conditions to ours.
How exactly are you allocating these 2MiB blocks?
Via CMA->alloc_contig_range() or via alloc_contig_range() directly? I
assume via CMA.
For
https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx
to do its work you'll have to pass __GFP_NORETRY to
alloc_contig_range(). This requires CMA adaptions, from where we call
alloc_contig_range().
--
Thanks,
David / dhildenb