On Mon, Jul 18, 2022, Sean Christopherson wrote: > On Sat, Jul 16, 2022, Mingwei Zhang wrote: > > On Fri, Jul 15, 2022, Sean Christopherson wrote: > > > Add a comment to document how host_pfn_mapping_level() can be used safely, > > > as the line between safe and dangerous is quite thin. E.g. if KVM were > > > to ever support in-place promotion to create huge pages, consuming the > > > level is safe if the caller holds mmu_lock and checks that there's an > > > existing _leaf_ SPTE, but unsafe if the caller only checks that there's a > > > non-leaf SPTE. > > > > > > Opportunistically tweak the existing comments to explicitly document why > > > KVM needs to use READ_ONCE(). > > > > > > No functional change intended. > > > > > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++------- > > > 1 file changed, 35 insertions(+), 7 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index bebff1d5acd4..d5b644f3e003 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > > > __direct_pte_prefetch(vcpu, sp, sptep); > > > } > > > > > > +/* > > > + * Lookup the mapping level for @gfn in the current mm. > > > + * > > > + * WARNING! Use of host_pfn_mapping_level() requires the caller and the end > > > + * consumer to be tied into KVM's handlers for MMU notifier events! > > Since calling this function won't cause kernel crash now, I guess we can > > remove the warning sign here, but keep the remaining statement since it > > is necessary. > > Calling this function won't _directly_ crash the kernel, but improper usage can > most definitely crash the host kernel, or even worse, silently corrupt host and > or guest data. E.g. if KVM were to race with an mmu_notifier event and incorrectly > map a stale huge page into the guest. > > So yes, the function itself is robust, but usage is still very subtle and delicate. Understood. So we basically create another "gup_fast_only()" within KVM and we worry that may confuse other developers so we add the warning sign. > > > > + * > > > + * There are several ways to safely use this helper: > > > + * > > > + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before > > > + * consuming it. In this case, mmu_lock doesn't need to be held during the > > > + * lookup, but it does need to be held while checking the MMU notifier. > > > > but it does need to be held while checking the MMU notifier and > > consuming the result. > > I didn't want to include "consuming the result" because arguably the result is > being consumed while running the guest, and obviously KVM doesn't hold mmu_lock > while running the guest (though I fully acknowledge the above effectively uses > "consume" in the sense of shoving the result into SPTEs). > > > > + * > > > + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation > > > + * event for the hva. This can be done by explicit checking the MMU notifier > > s/explicit/explicitly > > > > + * or by ensuring that KVM already has a valid mapping that covers the hva. > > > > Yes, more specifically, "mmu notifier sequence counter". > > Heh, depends on what the reader interprets as "sequence counter". If the reader > interprets that as the literal sequence counter, mmu_notifier_seq, then this phrasing > is incorrect as mmu_notifier_seq isn't bumped until the invalidation completes, > i.e. it guards against _past_ invalidations, not in-progress validations. > > My preference is to intentionally not be precise in describing how to check for an > in-progress invalidation, e.g. so that this comment doesn't need to be updated if > the details change, and to also to try and force developers to do more than copy > and paste if they want to use this helper. Hmm, I was going to say that I strongly disagree about the intentional unclearness. But then I find that MMU notifier implementation does require more than just the counter but also the range, so yeah, talking too much may fall into the weeds. But in general, I think mmu notifier deserves better documentation in both concept and implementation in KVM.