On Thu, Jul 20, 2023, isaku.yamahata@xxxxxxxxx wrote: > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index a73ddb43a2cf..35bb14363828 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4352,6 +4352,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > struct kvm_page_fault *fault) > { > int max_order, r; > + u8 max_level; > > if (!kvm_slot_can_be_private(fault->slot)) > return kvm_do_memory_fault_exit(vcpu, fault); > @@ -4361,8 +4362,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > if (r) > return r; > > - fault->max_level = min(kvm_max_level_for_order(max_order), > - fault->max_level); > + max_level = kvm_max_level_for_order(max_order); > + r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn, > + fault->gfn, &max_level); I think KVM should hook gmem itself, i.e. convert pages to private when they're allocated, and then back to shared when they're freed. I don't like having asymmetric APIs (convert on fault instead of allocate, but then convert back on free instead of on zap), and hooking the page fault path also violates the approach of tying the RMP/HKID to the physical allocation. Practically speaking, hooking the fault path will result in undesirable behavior. Just because KVM *maps* at 4KiB granularity doesn't mean the RMP must be assigned at 4KiB granularity, e.g. if userspace chooses to *not* PUNCH_HOLE when the guest shares a single 4KiB page in a 2MiB chunk. Dirty logging is another case where the RMP can stay at 2MiB. Or as a very silly example, imagine userspace pulls a stupid and covers a single 2MiB chunk of gmem with 512 memslots. That likely means KVM will need an additional hook to clamp the max_level at the RMP, but that shouldn't be a big deal, e.g. if we convert on allocation, then KVM should be able to blindly do the conversion because it would be a kernel bug if the page is already assigned to an ASID in the RMP, i.e. the additional hook shouldn't incur an extra RMP lookup.