> On Jul 16, 2019, at 10:50 AM, Waiman Long <longman@xxxxxxxxxx> wrote: > > On 7/16/19 10:29 AM, Alex Kogan wrote: >> >>> On Jul 15, 2019, at 7:22 PM, Waiman Long <longman@xxxxxxxxxx >>> <mailto:longman@xxxxxxxxxx>> wrote: >>> >>> On 7/15/19 5:30 PM, Waiman Long wrote: >>>>> -#ifndef _GEN_PV_LOCK_SLOWPATH >>>>> +#if !defined(_GEN_PV_LOCK_SLOWPATH) && !defined(_GEN_CNA_LOCK_SLOWPATH) >>>>> >>>>> #include <linux/smp.h> >>>>> #include <linux/bug.h> >>>>> @@ -77,18 +77,14 @@ >>>>> #define MAX_NODES 4 >>>>> >>>>> /* >>>>> - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in >>>>> - * size and four of them will fit nicely in one 64-byte cacheline. For >>>>> - * pvqspinlock, however, we need more space for extra data. To accommodate >>>>> - * that, we insert two more long words to pad it up to 32 bytes. IOW, only >>>>> - * two of them can fit in a cacheline in this case. That is OK as it is rare >>>>> - * to have more than 2 levels of slowpath nesting in actual use. We don't >>>>> - * want to penalize pvqspinlocks to optimize for a rare case in native >>>>> - * qspinlocks. >>>>> + * On 64-bit architectures, the mcs_spinlock structure will be 20 bytes in >>>>> + * size. For pvqspinlock or the NUMA-aware variant, however, we need more >>>>> + * space for extra data. To accommodate that, we insert two more long words >>>>> + * to pad it up to 36 bytes. >>>>> */ >>>> The 20 bytes figure is wrong. It is actually 24 bytes for 64-bit as the >>>> mcs_spinlock structure is 8-byte aligned. For better cacheline >>>> alignment, I will like to keep mcs_spinlock to 16 bytes as before. >>>> Instead, you can use encode_tail() to store the CNA node pointer in >>>> "locked". For instance, use (encode_tail() << 1) in locked to >>>> distinguish it from the regular locked=1 value. >>> >>> Actually, the encoded tail value is already shift left either 16 bits >>> or 9 bits. So there is no need to shift it. You can assigned it directly: >>> >>> mcs->locked = cna->encoded_tail; >>> >>> You do need to change the type of locked to "unsigned int", though, >>> for proper comparison with "1". >>> >> Got it, thanks. >> > I forgot to mention that I would like to see a boot command line option > to force off and maybe on as well the numa qspinlock code. This can help > in testing as you don't need to build 2 separate kernels, one with > NUMA_AWARE_SPINLOCKS on and one with it off. IIUC it should be easy to add a boot option to force off the NUMA-aware spinlock even if it is enabled though config, but the other way around would require compiling in the NUMA-aware spinlock stuff even if the config option is disabled. Is that ok? Also, what should the option name be? "numa_spinlock=on/off” if we want both ways, or “no_numa_spinlock" if we want just the “force off” option? > For small 2-socket systems, > numa qspinlock may not help much. It actually helps quite a bit (e.g., speedup of up to 42-57% for will-it-scale on a dual-socket x86 system). We have numbers and plots in our paper on arxiv. Regards, — Alex