On Fri, Aug 27, 2021 at 02:49:50PM +0000, Sean Christopherson wrote: > On Fri, Aug 27, 2021, Peter Zijlstra wrote: > > On Thu, Aug 26, 2021 at 05:57:08PM -0700, Sean Christopherson wrote: > > > Use a per-CPU pointer to track perf's guest callbacks so that KVM can set > > > the callbacks more precisely and avoid a lurking NULL pointer dereference. > > > > I'm completely failing to see how per-cpu helps anything here... > > It doesn't help until KVM is converted to set the per-cpu pointer in flows that > are protected against preemption, and more specifically when KVM only writes to > the pointer from the owning CPU. So the 'problem' I have with this is that sane (!KVM using) people, will still have to suffer that load, whereas with the static_call() we patch in an 'xor %rax,%rax' and only have immediate code flow. > Ignoring static call for the moment, I don't see how the unreg side can be safe > using a bare single global pointer. There is no way for KVM to prevent an NMI > from running in parallel on a different CPU. If there's a more elegant solution, > especially something that can be backported, e.g. an rcu-protected pointer, I'm > all for it. I went down the per-cpu path because it allowed for cleanups in KVM, > but similar cleanups can be done without per-cpu perf callbacks. If all the perf_guest_cbs dereferences are with preemption disabled (IRQs disabled, IRQ context, NMI context included), then the sequence: WRITE_ONCE(perf_guest_cbs, NULL); synchronize_rcu(); Ensures that all prior observers of perf_guest_csb will have completed and future observes must observe the NULL value.