On Fri, Aug 04, 2023 at 09:33:48AM +0800, Guo Ren wrote: > On Thu, Aug 3, 2023 at 7:57 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > CNA should only show a benefit when there is strong inter-node > > contention, and in that case it is typically best to fix the kernel side > > locking. > > > > Hence the question as to what lock prompted you to look at this. > I met the long lock queue situation when the hardware gave an overly > aggressive store queue merge buffer delay mechanism. See: > https://lore.kernel.org/linux-riscv/20230802164701.192791-8-guoren@xxxxxxxxxx/ *groan*, so you're using it to work around 'broken' hardware :-( Wouldn't that hardware have horrifically bad lock throughput anyway? Everybody would end up waiting on that store buffer delay. > This also let me consider improving the efficiency of the long lock > queue release. For example, if the queue is like this: > > (Node0 cpu0) -> (Node1 cpu64) -> (Node0 cpu1) -> (Node1 cpu65) -> > (Node0 cpu2) -> (Node1 cpu66) -> ... > > Then every mcs_unlock would cause a cross-NUMA transaction. But if we > could make the queue like this: See, this is where the ARM64 WFE would come in handy; I don't suppose RISC-V has anything like that? Also, by the time you have 6 waiters, I'd say the lock is terribly contended and you should look at improving the lockinh scheme.