On Mon, 2022-11-14 at 16:33 +0000, Sean Christopherson wrote: > On Mon, Nov 14, 2022, Woodhouse, David wrote: > > Most other data structures, including the pvclock info (both Xen and > > native KVM), could potentially cross page boundaries. And isn't that > > also true for things that we'd want to use the GPC for in nesting? > > Off the top of my head, no. Except for MSR and I/O permission bitmaps, which > are >4KiB, things that are referenced by physical address are <=4KiB and must be > naturally aligned. nVMX does temporarily map L1's MSR bitmap, but that could be > split into two separate mappings if necessary. > > > For the runstate info I suggested reverting commit a795cd43c5b5 but > > that doesn't actually work because it still has the same problem. Even > > the gfn-to-hva cache still only really works for a single page, and > > things like kvm_write_guest_offset_cached() will fall back to using > > kvm_write_guest() in the case where it crosses a page boundary. > > > > I'm wondering if the better fix is to allow the GPC to map more than > > one page. > > I agree that KVM should drop the "no page splits" restriction, but I don't think > that would necessarily solve all KVM Xen issues. KVM still needs to precisely > handle the "correct" struct size, e.g. if one of the structs is placed at the very > end of the page such that the smaller compat version doesn't split a page but the > 64-bit version does. I think we can be more explicit that the guest 'long' mode shall never change while anything is mapped. Xen automatically detects that a guest is in 64-bit mode very early on, either in the first 'fill the hypercall page' MSR write, or when setting HVM_PARAM_CALLBACK_IRQ to configure interrupt routing. Strictly speaking, a guest could put itself into 32-bit mode and set HVM_PARAM_CALLBACK_IRQ *again*. Xen would only update the wallclock time in that case, and makes no attempt to convert anything else. I don't think we need to replicate that. On kexec/soft reset it could go back to 32-bit mode, but the soft reset unmaps everything so that's OK. I looked at making the GPC handle multiple pages but can't see how to sanely do it for the IOMEM case. vmap() takes a list of *pages* not PFNs, and memremap_pages() is... overly complex. But if we can reduce it to *just* the runstate info that potentially needs >1 page, then we can probably handle that with using two GPC (or maybe GHC) caches for it.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature