Re: [PATCH v3] mm/vmalloc: lock contention optimization under multi-threading

Uladzislau Rezki <urezki@xxxxxxxxx> · Thu, 22 Feb 2024 13:52:07 +0100

Hello, Rulinhuang!

> Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch 
> mm-unstable and remeasured it. Could you kindly help confirm if 
> this is the right base to work on?
> Compared to the previous result at kernel v6.7 with a 5% performance 
> gain on intel icelake(160 vcpu), we only had a 0.6% with this commit 
> base. But we think our modification still has some significance. On 
> the one hand, this does reduce a critical section. On the other hand, 
> we have a 4% performance gain on intel sapphire rapids(224 vcpu), 
> which suggests more performance improvement would likely be achieved 
> when the core count of processors increases to hundreds or 
> even thousands.
> Thank you again for your comments.
>
According to the patch that was a correct rebase. Right a small delta
on your 160 CPUs is because of removing a contention. As for bigger
systems it is bigger impact, like you point here on your 224 vcpu
results where you see %4 perf improvement.

So we should fix it. But the way how it is fixed is not optimal from
my point of view, because the patch that is in question spreads the
internals from alloc_vmap_area(), like inserting busy area, across
many parts now.

--
Uladzislau Rezki