On Thu, Dec 03, 2020, David Woodhouse wrote: > On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote: > > > On IRC, Paolo told me that permanent pinning causes problems for memory > > > hotplug, and pointed me at the trick we do with an MMU notifier and > > > kvm_vcpu_reload_apic_access_page(). > > > > Okay that answers my question. Thanks for clearing that up. > > > > Not sure of a good place to document this but it would be good to > > have this written down somewhere. Maybe kvm_map_gfn()? > > Trying not to get too distracted by polishing this part, so I can > continue with making more things actually work. But I took a quick look > at the reload_apic_access_page() thing. > > AFAICT it works because the access is only from *within* the vCPU, in > guest mode. > > So all the notifier has to do is kick all CPUs, which happens when it > calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs > are *out* of guest mode by the time... > > ...er... maybe not by the time the notifier returns, because all > we've done is *send* the IPI and we don't know the other CPUs have > actually stopped running the guest yet? > > Maybe there's some explanation of why the actual TLB shootdown > truly *will* occur before the page goes away, and some ordering > rules which mean our reschedule IPI will happen first? Something > like that ideally would have been in a comment in in MMU notifier. KVM_REQ_APIC_PAGE_RELOAD is tagged with KVM_REQUEST_WAIT, which means that kvm_kick_many_cpus() and thus smp_call_function_many() will have @wait=true, i.e. the sender will wait for the SMP function call to finish on the target CPUs.