On 03/04/2019 18:01, Peter Zijlstra wrote: > On Wed, Apr 03, 2019 at 11:39:09AM -0400, Alex Kogan wrote: > >>>> The patch that I am looking for is to have a separate >>>> numa_queued_spinlock_slowpath() that coexists with >>>> native_queued_spinlock_slowpath() and >>>> paravirt_queued_spinlock_slowpath(). At boot time, we select the most >>>> appropriate one for the system at hand. >> Is this how this selection works today for paravirt? >> I see a PARAVIRT_SPINLOCKS config option, but IIUC you are talking about a different mechanism here. >> Can you, please, elaborate or give me a link to a page that explains that? > > Oh man, you ask us to explain how paravirt patching works... that's > magic :-) > > Basically, the compiler will emit a bunch of indirect calls to the > various pv_ops.*.* functions. > > Then, at alternative_instructions() <- apply_paravirt() it will rewrite > all these indirect calls to direct calls to the function pointers that > are in the pv_ops structure at that time (+- more magic). > > So we initialize the pv_ops.lock.* methods to the normal > native_queued_spin*() stuff, if KVM/Xen/whatever setup detectors pv > spnlock support changes the methods to the paravirt_queued_*() stuff. > > If you wnt more details, you'll just have to read > arch/x86/include/asm/paravirt*.h and arch/x86/kernel/paravirt*.c, I > don't think there's a coherent writeup of all that. > >>> Agreed; and until we have static_call, I think we can abuse the paravirt >>> stuff for this. >>> >>> By the time we patch the paravirt stuff: >>> >>> check_bugs() >>> alternative_instructions() >>> apply_paravirt() >>> >>> we should already have enumerated the NODE topology and so nr_node_ids() >>> should be set. >>> >>> So if we frob pv_ops.lock.queued_spin_lock_slowpath to >>> numa_queued_spin_lock_slowpath before that, it should all get patched >>> just right. >>> >>> That of course means the whole NUMA_AWARE_SPINLOCKS thing depends on >>> PARAVIRT_SPINLOCK, which is a bit awkward… > >> Just to mention here, the patch so far does not address paravirt, but >> our goal is to add this support once we address all the concerns for >> the native version. So we will end up with four variants for the >> queued_spinlock_slowpath() — one for each combination of >> native/paravirt and NUMA/non-NUMA. Or perhaps we do not need a >> NUMA/paravirt variant? > > I wouldn't bother with a pv version of the numa aware code at all. If > you have overcommitted guests, topology is likely irrelevant anyway. If > you have 1:1 pinned guests, they'll not use pv spinlocks anyway. > > So keep it to tertiary choice: > > - native > - native/numa > - paravirt Just for the records: the paravirt variant could easily choose whether it wants to include a numa version just by using the existing hooks. With PARAVIRT_SPINLOCK configured I guess even the native case would need to use the paravirt hooks for selection of native or native/numa. Without PARAVIRT_SPINLOCK this would be just an alternative() then? Maybe the resulting code would be much more readable if we'd just make PARAVIRT_SPINLOCK usable without the other PARAVIRT hooks? So splitting up PARAVIRT into PARAVIRT_GUEST (timer hooks et al) and the patching infrastructure, with PARAVIRT_GUEST and PARAVIRT_SPINLOCK selecting PARAVIRT, and PARAVIRT_XXL selecting PARAVIRT_GUEST. Juergen