Hello, Rulinhuang! > Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch > mm-unstable and remeasured it. Could you kindly help confirm if > this is the right base to work on? > Compared to the previous result at kernel v6.7 with a 5% performance > gain on intel icelake(160 vcpu), we only had a 0.6% with this commit > base. But we think our modification still has some significance. On > the one hand, this does reduce a critical section. On the other hand, > we have a 4% performance gain on intel sapphire rapids(224 vcpu), > which suggests more performance improvement would likely be achieved > when the core count of processors increases to hundreds or > even thousands. > Thank you again for your comments. > According to the patch that was a correct rebase. Right a small delta on your 160 CPUs is because of removing a contention. As for bigger systems it is bigger impact, like you point here on your 224 vcpu results where you see %4 perf improvement. So we should fix it. But the way how it is fixed is not optimal from my point of view, because the patch that is in question spreads the internals from alloc_vmap_area(), like inserting busy area, across many parts now. -- Uladzislau Rezki