On Fri, Sep 30, 2022 at 05:19:00PM +0100, Fuad Tabba wrote: > Hi, > > On Tue, Sep 27, 2022 at 11:47 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Mon, Sep 26, 2022, Fuad Tabba wrote: > > > Hi, > > > > > > On Mon, Sep 26, 2022 at 3:28 PM Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> wrote: > > > > > > > > On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote: > > > > > > Then on the KVM side, its mmap_start() + mmap_end() sequence would: > > > > > > > > > > > > 1. Not be supported for TDX or SEV-SNP because they don't allow adding non-zero > > > > > > memory into the guest (after pre-boot phase). > > > > > > > > > > > > 2. Be mutually exclusive with shared<=>private conversions, and is allowed if > > > > > > and only if the entire gfn range of the associated memslot is shared. > > > > > > > > > > In general I think that this would work with pKVM. However, limiting > > > > > private<->shared conversions to the granularity of a whole memslot > > > > > might be difficult to handle in pKVM, since the guest doesn't have the > > > > > concept of memslots. For example, in pKVM right now, when a guest > > > > > shares back its restricted DMA pool with the host it does so at the > > > > > page-level. > > > > Y'all are killing me :-) > > :D > > > Isn't the guest enlightened? E.g. can't you tell the guest "thou shalt share at > > granularity X"? With KVM's newfangled scalable memslots and per-vCPU MRU slot, > > X doesn't even have to be that high to get reasonable performance, e.g. assuming > > the DMA pool is at most 2GiB, that's "only" 1024 memslots, which is supposed to > > work just fine in KVM. > > The guest is potentially enlightened, but the host doesn't necessarily > know which memslot the guest might want to share back, since it > doesn't know where the guest might want to place the DMA pool. If I > understand this correctly, for this to work, all memslots would need > to be the same size and sharing would always need to happen at that > granularity. > > Moreover, for something like a small DMA pool this might scale, but > I'm not sure about potential future workloads (e.g., multimedia > in-place sharing). > > > > > > > > pKVM would also need a way to make an fd accessible again > > > > > when shared back, which I think isn't possible with this patch. > > > > > > > > But does pKVM really want to mmap/munmap a new region at the page-level, > > > > that can cause VMA fragmentation if the conversion is frequent as I see. > > > > Even with a KVM ioctl for mapping as mentioned below, I think there will > > > > be the same issue. > > > > > > pKVM doesn't really need to unmap the memory. What is really important > > > is that the memory is not GUP'able. > > > > Well, not entirely unguppable, just unguppable without a magic FOLL_* flag, > > otherwise KVM wouldn't be able to get the PFN to map into guest memory. > > > > The problem is that gup() and "mapped" are tied together. So yes, pKVM doesn't > > strictly need to unmap memory _in the untrusted host_, but since mapped==guppable, > > the end result is the same. > > > > Emphasis above because pKVM still needs unmap the memory _somehwere_. IIUC, the > > current approach is to do that only in the stage-2 page tables, i.e. only in the > > context of the hypervisor. Which is also the source of the gup() problems; the > > untrusted kernel is blissfully unaware that the memory is inaccessible. > > > > Any approach that moves some of that information into the untrusted kernel so that > > the kernel can protect itself will incur fragmentation in the VMAs. Well, unless > > all of guest memory becomes unguppable, but that's likely not a viable option. > > Actually, for pKVM, there is no need for the guest memory to be > GUP'able at all if we use the new inaccessible_get_pfn(). If pKVM can use inaccessible_get_pfn() to get pfn and can avoid GUP (I think that is the major concern?), do you see any other gap from existing API? > This of > course goes back to what I'd mentioned before in v7; it seems that > representing the memslot memory as a file descriptor should be > orthogonal to whether the memory is shared or private, rather than a > private_fd for private memory and the userspace_addr for shared > memory. The host can then map or unmap the shared/private memory using > the fd, which allows it more freedom in even choosing to unmap shared > memory when not needed, for example. Using both private_fd and userspace_addr is only needed in TDX and other confidential computing scenarios, pKVM may only use private_fd if the fd can also be mmaped as a whole to userspace as Sean suggested. Thanks, Chao > > Cheers, > /fuad