On Wed, Apr 03, 2019 at 11:39:09AM -0400, Alex Kogan wrote: > >> The patch that I am looking for is to have a separate > >> numa_queued_spinlock_slowpath() that coexists with > >> native_queued_spinlock_slowpath() and > >> paravirt_queued_spinlock_slowpath(). At boot time, we select the most > >> appropriate one for the system at hand. > Is this how this selection works today for paravirt? > I see a PARAVIRT_SPINLOCKS config option, but IIUC you are talking about a different mechanism here. > Can you, please, elaborate or give me a link to a page that explains that? Oh man, you ask us to explain how paravirt patching works... that's magic :-) Basically, the compiler will emit a bunch of indirect calls to the various pv_ops.*.* functions. Then, at alternative_instructions() <- apply_paravirt() it will rewrite all these indirect calls to direct calls to the function pointers that are in the pv_ops structure at that time (+- more magic). So we initialize the pv_ops.lock.* methods to the normal native_queued_spin*() stuff, if KVM/Xen/whatever setup detectors pv spnlock support changes the methods to the paravirt_queued_*() stuff. If you wnt more details, you'll just have to read arch/x86/include/asm/paravirt*.h and arch/x86/kernel/paravirt*.c, I don't think there's a coherent writeup of all that. > > Agreed; and until we have static_call, I think we can abuse the paravirt > > stuff for this. > > > > By the time we patch the paravirt stuff: > > > > check_bugs() > > alternative_instructions() > > apply_paravirt() > > > > we should already have enumerated the NODE topology and so nr_node_ids() > > should be set. > > > > So if we frob pv_ops.lock.queued_spin_lock_slowpath to > > numa_queued_spin_lock_slowpath before that, it should all get patched > > just right. > > > > That of course means the whole NUMA_AWARE_SPINLOCKS thing depends on > > PARAVIRT_SPINLOCK, which is a bit awkward… > Just to mention here, the patch so far does not address paravirt, but > our goal is to add this support once we address all the concerns for > the native version. So we will end up with four variants for the > queued_spinlock_slowpath() — one for each combination of > native/paravirt and NUMA/non-NUMA. Or perhaps we do not need a > NUMA/paravirt variant? I wouldn't bother with a pv version of the numa aware code at all. If you have overcommitted guests, topology is likely irrelevant anyway. If you have 1:1 pinned guests, they'll not use pv spinlocks anyway. So keep it to tertiary choice: - native - native/numa - paravirt