On Tue, Apr 05, 2022, Andy Lutomirski wrote: > On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote: > > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > >> The best I can come up with is a special type of shared page that is not > >> GUP-able and maybe not even mmappable, having a clear option for > >> transitions to fail, and generally preventing the nasty cases from > >> happening in the first place. > > > > Right, that sounds reasonable to me. > > At least as a v1, this is probably more straightforward than allowing mmap(). > Also, there's much to be said for a simpler, limited API, to be expanded if > genuinely needed, as opposed to starting out with a very featureful API. Regarding "genuinely needed", IMO the same applies to supporting this at all. Without numbers from something at least approximating a real use case, we're just speculating on which will be the most performant approach. > >> Maybe there could be a special mode for the private memory fds in which > >> specific pages are marked as "managed by this fd but actually shared". > >> pread() and pwrite() would work on those pages, but not mmap(). (Or maybe > >> mmap() but the resulting mappings would not permit GUP.) And > >> transitioning them would be a special operation on the fd that is specific > >> to pKVM and wouldn't work on TDX or SEV. > > > > Aha, didn't think of pread()/pwrite(). Very interesting. > > There are plenty of use cases for which pread()/pwrite()/splice() will be as > fast or even much faster than mmap()+memcpy(). ... > resume guest > *** host -> hypervisor -> guest *** > Guest unshares the page. > *** guest -> hypervisor *** > Hypervisor removes PTE. TLBI. > *** hypervisor -> guest *** > > Obviously considerable cleverness is needed to make a virt IOMMU like this > work well, but still. > > Anyway, my suggestion is that the fd backing proposal get slightly modified > to get it ready for multiple subtypes of backing object, which should be a > pretty minimal change. Then, if someone actually needs any of this > cleverness, it can be added later. In the mean time, the > pread()/pwrite()/splice() scheme is pretty good. Tangentially related to getting private-fd ready for multiple things, what about implementing the pread()/pwrite()/splice() scheme in pKVM itself? I.e. read() on the VM fd, with the offset corresponding to gfn in some way. Ditto for mmap() on the VM fd, though that would require additional changes outside of pKVM. That would allow pKVM to support in-place conversions without the private-fd having to differentiate between the type of protected VM, and without having to provide new APIs from the private-fd. TDX, SNP, etc... Just Work by not supporting the pKVM APIs. And assuming we get multiple consumers down the road, pKVM will need to be able to communicate the "true" state of a page to other consumers, because in addition to being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and the SEV PSP.