Hi, Wei. > On Jun 11, 2019, at 12:22 AM, liwei (GF) <liwei391@xxxxxxxxxx> wrote: > > Hi Alex, > > On 2019/3/29 23:20, Alex Kogan wrote: >> In CNA, spinning threads are organized in two queues, a main queue for >> threads running on the same node as the current lock holder, and a >> secondary queue for threads running on other nodes. At the unlock time, >> the lock holder scans the main queue looking for a thread running on >> the same node. If found (call it thread T), all threads in the main queue >> between the current lock holder and T are moved to the end of the >> secondary queue, and the lock is passed to T. If such T is not found, the >> lock is passed to the first node in the secondary queue. Finally, if the >> secondary queue is empty, the lock is passed to the next thread in the >> main queue. For more details, see https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwICbg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=U7mfTbYj1r2Te2BBUUNbVrRPuTa_ujlpR4GZfUsrGTM&s=Dw4O1EniF-nde4fp6RA9ISlSMOjWuqeR9OS1G0iauj0&e=. >> >> Note that this variant of CNA may introduce starvation by continuously >> passing the lock to threads running on the same node. This issue >> will be addressed later in the series. >> >> Enabling CNA is controlled via a new configuration option >> (NUMA_AWARE_SPINLOCKS), which is enabled by default if NUMA is enabled. >> >> Signed-off-by: Alex Kogan <alex.kogan@xxxxxxxxxx> >> Reviewed-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >> --- >> arch/x86/Kconfig | 14 +++ >> include/asm-generic/qspinlock_types.h | 13 +++ >> kernel/locking/mcs_spinlock.h | 10 ++ >> kernel/locking/qspinlock.c | 29 +++++- >> kernel/locking/qspinlock_cna.h | 173 ++++++++++++++++++++++++++++++++++ >> 5 files changed, 236 insertions(+), 3 deletions(-) >> create mode 100644 kernel/locking/qspinlock_cna.h >> > (SNIP) >> + >> +static __always_inline int get_node_index(struct mcs_spinlock *node) >> +{ >> + return decode_count(node->node_and_count++); > When nesting level is > 4, it won't return a index >= 4 here and the numa node number > is changed by mistake. It will go into a wrong way instead of the following branch. > > > /* > * 4 nodes are allocated based on the assumption that there will > * not be nested NMIs taking spinlocks. That may not be true in > * some architectures even though the chance of needing more than > * 4 nodes will still be extremely unlikely. When that happens, > * we fall back to spinning on the lock directly without using > * any MCS node. This is not the most elegant solution, but is > * simple enough. > */ > if (unlikely(idx >= MAX_NODES)) { > while (!queued_spin_trylock(lock)) > cpu_relax(); > goto release; > } Good point. This patch does not handle count overflows gracefully. It can be easily fixed by allocating more bits for the count — we don’t really need 30 bits for #NUMA nodes. However, I am working on a new revision of the patch, in which the cna node encapsulates the mcs node (following Peter’s suggestion and similarly to pv_node). With that approach, this issue is gone. Best regards, — Alex