On Thu, Mar 07, 2024, David Matlack wrote: > On 2024-03-08 01:20 PM, Huang, Kai wrote: > > > > > +:Parameters: struct kvm_memory_mapping(in/out) > > > > > +:Returns: 0 on success, <0 on error > > > > > + > > > > > +KVM_MAP_MEMORY populates guest memory without running vcpu. > > > > > + > > > > > +:: > > > > > + > > > > > + struct kvm_memory_mapping { > > > > > + __u64 base_gfn; > > > > > + __u64 nr_pages; > > > > > + __u64 flags; > > > > > + __u64 source; > > > > > + }; > > > > > + > > > > > + /* For kvm_memory_mapping:: flags */ > > > > > + #define KVM_MEMORY_MAPPING_FLAG_WRITE _BITULL(0) > > > > > + #define KVM_MEMORY_MAPPING_FLAG_EXEC _BITULL(1) > > > > > + #define KVM_MEMORY_MAPPING_FLAG_USER _BITULL(2) > > > > > > > > I am not sure what's the good of having "FLAG_USER"? > > > > > > > > This ioctl is called from userspace, thus I think we can just treat this always > > > > as user-fault? > > > > > > The point is how to emulate kvm page fault as if vcpu caused the kvm page > > > fault. Not we call the ioctl as user context. > > > > Sorry I don't quite follow. What's wrong if KVM just append the #PF USER > > error bit before it calls into the fault handler? > > > > My question is, since this is ABI, you have to tell how userspace is > > supposed to use this. Maybe I am missing something, but I don't see how > > USER should be used here. > > If we restrict this API to the TDP MMU then KVM_MEMORY_MAPPING_FLAG_USER > is meaningless, PFERR_USER_MASK is only relevant for shadow paging. +1 > KVM_MEMORY_MAPPING_FLAG_WRITE seems useful to allow memslots to be > populated with writes (which avoids just faulting in the zero-page for > anon or tmpfs backed memslots), while also allowing populating read-only > memslots. > > I don't really see a use-case for KVM_MEMORY_MAPPING_FLAG_EXEC. It would midly be interesting for something like the NX hugepage mitigation. For the initial implementation, I don't think the ioctl() should specify protections, period. VMA-based mappings, i.e. !guest_memfd, already have a way to specify protections. And for guest_memfd, finer grained control in general, and long term compatibility with other features that are in-flight or proposed, I would rather userspace specify RWX protections via KVM_SET_MEMORY_ATTRIBUTES. Oh, and dirty logging would be a pain too. KVM doesn't currently support execute-only (XO) or !executable (RW), so I think we can simply define KVM_MAP_MEMORY to behave like a read fault. E.g. map RX, and add W if all underlying protections allow it. That way we can defer dealing with things like XO and RW *if* KVM ever does gain support for specifying those combinations via KVM_SET_MEMORY_ATTRIBUTES, which will likely be per-arch/vendor and non-trivial, e.g. AMD's NPT doesn't even allow for XO memory. And we shouldn't need to do anything for KVM_MAP_MEMORY in particular if KVM_SET_MEMORY_ATTRIBUTES gains support for RWX protections the existing RWX and RX combinations, e.g. if there's a use-case for write-protecting guest_memfd regions. We can always expand the uAPI, but taking away functionality is much harder, if not impossible.