On Fri, Sep 22, 2023, Michael Roth wrote: > On Thu, Sep 21, 2023 at 01:33:23PM -0700, Sean Christopherson wrote: > > + /* > > + * For simplicity, allow mapping a hugepage if and only if the entire > > + * binding is compatible, i.e. don't bother supporting mapping interior > > + * sub-ranges with hugepages (unless userspace comes up with a *really* > > + * strong use case for needing hugepages within unaligned bindings). > > + */ > > + if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) || > > + !IS_ALIGNED(slot->npages, 1ull << *max_order)) > > + *max_order = 0; > > Thanks for working this in. Unfortunately on x86 the bulk of guest memory > ends up getting slotted directly above legacy regions at GFN 0x100, Can you provide an example? I'm struggling to understand what the layout actually is. I don't think it changes the story for the kernel, but it sounds like there might be room for optimization in QEMU? Or more likely, I just don't understand what you're saying :-) > so the associated slot still ends failing these alignment checks if it tries > to match the gmemfd offsets up with the shared RAM/memfd offsets. > > I tried to work around it in userspace by padding the gmemfd offset of > each slot to the next 2M boundary, but that also requires dynamically > growing the gmemfd inode to account for the padding of each new slot and > it gets ugly enough that I'm not sure it's any better than your > suggested alternative of using a unique gmemfd for each slot. > > But what if we relax the check to simply make sure that any large folio > must is fully-contained by the range of the slot is bound to? It *seems* > like that would still avoid stuff like mapping 2M pages in the NPT (or > setting up 2M RMP table entries) that aren't fully contained by a slot > while still allowing the bulk of guest memory to get mapped as 2M. Are > there other edge cases to consider? > > The following seems to work for a basic 16GB SNP guest at least: > > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c > index 9109bf5751ee..e73128d4ebc2 100644 > --- a/virt/kvm/guest_mem.c > +++ b/virt/kvm/guest_mem.c > @@ -618,6 +618,7 @@ int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep) > { > pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; > + pgoff_t huge_index; > struct kvm_gmem *gmem; > struct folio *folio; > struct page *page; > @@ -662,9 +663,12 @@ int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > * sub-ranges with hugepages (unless userspace comes up with a *really* > * strong use case for needing hugepages within unaligned bindings). > */ > - if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) || > - !IS_ALIGNED(slot->npages, 1ull << *max_order)) > + huge_index = round_down(index, 1ull << *max_order); Why not use ALIGN() here? The size is obviously a power-of-2. Or is my math even worse than I thought? > + if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) || > + huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages) { Argh, I keep forgetting that the MMU is responsible for handling misaligned gfns. Yeah, this looks right. Can you post this as a proper patch, on top of my fixes? And without the pr_debug(). That'll be the easiest for me to apply+squash when the time comes. Thanks much!