On Fri, Nov 19, 2021, Jason Gunthorpe wrote: > On Fri, Nov 19, 2021 at 10:21:39PM +0000, Sean Christopherson wrote: > > On Fri, Nov 19, 2021, Jason Gunthorpe wrote: > > > On Fri, Nov 19, 2021 at 07:18:00PM +0000, Sean Christopherson wrote: > > > > No ideas for the kernel API, but that's also less concerning since > > > > it's not set in stone. I'm also not sure that dedicated APIs for > > > > each high-ish level use case would be a bad thing, as the semantics > > > > are unlikely to be different to some extent. E.g. for the KVM use > > > > case, there can be at most one guest associated with the fd, but > > > > there can be any number of VFIO devices attached to the fd. > > > > > > Even the kvm thing is not a hard restriction when you take away > > > confidential compute. > > > > > > Why can't we have multiple KVMs linked to the same FD if the memory > > > isn't encrypted? Sure it isn't actually useful but it should work > > > fine. > > > > Hmm, true, but I want the KVM semantics to be 1:1 even if memory > > isn't encrypted. > > That is policy and it doesn't belong hardwired into the kernel. Agreed. I had a blurb typed up about that policy just being an "exclusive" flag in the kernel API that KVM would set when creating a confidential VM, but deleted it and forgot to restore it when I went down the tangent of removing userspace from the TCB without an assist from hardware/firmware. > Your explanation makes me think that the F_SEAL_XX isn't defined > properly. It should be a userspace trap door to prevent any new > external accesses, including establishing new kvms, iommu's, rdmas, > mmaps, read/write, etc. Hmm, the way I was thinking of it is that it the F_SEAL_XX itself would prevent mapping/accessing it from userspace, and that any policy beyond that would be done via kernel APIs and thus handled by whatever in-kernel agent can access the memory. E.g. in the confidential VM case, without support for trusted devices, KVM would require that it be the sole owner of the file.