On Thu, Dec 12, 2019 at 9:34 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Wed, Dec 11, 2019 at 04:32:07PM -0500, Barret Rhoden wrote: > > This change allows KVM to map DAX-backed files made of huge pages with > > huge mappings in the EPT/TDP. > > > > DAX pages are not PageTransCompound. The existing check is trying to > > determine if the mapping for the pfn is a huge mapping or not. For > > non-DAX maps, e.g. hugetlbfs, that means checking PageTransCompound. > > For DAX, we can check the page table itself. > > > > Note that KVM already faulted in the page (or huge page) in the host's > > page table, and we hold the KVM mmu spinlock. We grabbed that lock in > > kvm_mmu_notifier_invalidate_range_end, before checking the mmu seq. > > > > Signed-off-by: Barret Rhoden <brho@xxxxxxxxxx> > > --- > > arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++++++++++++++++---- > > 1 file changed, 32 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 6f92b40d798c..cd07bc4e595f 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3384,6 +3384,35 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > return -EFAULT; > > } > > > > +static bool pfn_is_huge_mapped(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn) > > +{ > > + struct page *page = pfn_to_page(pfn); > > + unsigned long hva; > > + > > + if (!is_zone_device_page(page)) > > + return PageTransCompoundMap(page); > > + > > + /* > > + * DAX pages do not use compound pages. The page should have already > > + * been mapped into the host-side page table during try_async_pf(), so > > + * we can check the page tables directly. > > + */ > > + hva = gfn_to_hva(kvm, gfn); > > + if (kvm_is_error_hva(hva)) > > + return false; > > + > > + /* > > + * Our caller grabbed the KVM mmu_lock with a successful > > + * mmu_notifier_retry, so we're safe to walk the page table. > > + */ > > + switch (dev_pagemap_mapping_shift(hva, current->mm)) { > > + case PMD_SHIFT: > > + case PUD_SIZE: > > I assume this means DAX can have 1GB pages? Correct, it can. Not in the filesystem-dax case, but device-dax supports 1GB pages. > I ask because KVM's THP logic > has historically relied on THP only supporting 2MB. I cleaned this up in > a recent series[*], which is in kvm/queue, but I obviously didn't actually > test whether or not KVM would correctly handle 1GB non-hugetlbfs pages. Yeah, since device-dax is the only path to support longterm page pinning for vfio device assignment, testing with device-dax + 1GB pages would be a useful sanity check.