On Mon, Sep 02, 2024 at 12:22:56PM +1000, Alexey Kardashevskiy wrote: > > > On 31/8/24 02:57, Xu Yilun wrote: > > On Fri, Aug 23, 2024 at 11:21:27PM +1000, Alexey Kardashevskiy wrote: > > > Currently private MMIO nested page faults are not expected so when such > > > fault occurs, KVM tries moving the faulted page from private to shared > > > which is not going to work as private MMIO is not backed by memfd. > > > > > > Handle private MMIO as shared: skip page state change and memfd > > > > This means host keeps the mapping for private MMIO, which is different > > from private memory. Not sure if it is expected, and I want to get > > some directions here. > > There is no other translation table on AMD though, the same NPT. The Sorry for not being clear, when I say "host mapping" I mean host userspace mapping (host CR3 mapping). By using guest_memfd, there is no host CR3 mapping for private memory. I'm wondering if we could keep host CR3 mapping for private MMIO. > security is enforced by the RMP table. A device says "bar#x is private" so > the host + firmware ensure the each corresponding RMP entry is "assigned" + > "validated" and has a correct IDE stream ID and ASID, and the VM's kernel > maps it with the Cbit set. > > > From HW perspective, private MMIO is not intended to be accessed by > > host, but the consequence may varies. According to TDISP spec 11.2, > > my understanding is private device (known as TDI) should reject the > > TLP and transition to TDISP ERROR state. But no further error > > reporting or logging is mandated. So the impact to the host system > > is specific to each device. In my test environment, an AER > > NonFatalErr is reported and nothing more, much better than host > > accessing private memory. > > afair I get an non-fatal RMP fault so the device does not even notice. > > > On SW side, my concern is how to deal with mmu_notifier. In theory, if > > we get pfn from hva we should follow the userspace mapping change. But > > that makes no sense. Especially for TDX TEE-IO, private MMIO mapping > > in SEPT cannot be changed or invalidated as long as TDI is running. > > > Another concern may be specific for TDX TEE-IO. Allowing both userspace > > mapping and SEPT mapping may be safe for private MMIO, but on > > KVM_SET_USER_MEMORY_REGION2, KVM cannot actually tell if a userspace > > addr is really for private MMIO. I.e. user could provide shared memory > > addr to KVM but declare it is for private MMIO. The shared memory then > > could be mapped in SEPT and cause problem. > > I am missing lots of context here. When you are starting a guest with a > passed through device, until the TDISP machinery transitions the TDI into > RUN, this TDI's MMIO is shared and mapped everywhere. And after Yes, that's the situation nowadays. I think if we need to eliminate host CR3 mapping for private MMIO, a simple way is we don't allow host CR3 mapping at the first place, even for shared pass through. It is doable cause: 1. IIUC, host CR3 mapping for assigned MMIO is only used for pfn finding, i.e. host doesn't really (or shouldn't?) access them. 2. The hint from guest_memfd shows KVM doesn't have to rely on host CR3 mapping to find pfn. > transitioning to RUN you move mappings from EPT to SEPT? Mostly correct, TDX move mapping from EPT to SEPT after LOCKED and right before RUN. > > > So personally I prefer no host mapping for private MMIO. > > Nah, cannot skip this step on AMD. Thanks, Not sure if we are on the same page. I assume from HW perspective, host CR3 mapping is not necessary for NPT/RMP build? Thanks, Yilun > > > > > > Thanks, > > Yilun > > > > > page state tracking. > > > > > > The MMIO KVM memory slot is still marked as shared as the guest can > > > access it as private or shared so marking the MMIO slot as private > > > is not going to help. > > > > > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxx> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 928cf84778b0..e74f5c3d0821 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -4366,7 +4366,11 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > > { > > > bool async; > > > - if (fault->is_private) > > > + if (fault->slot && fault->is_private && !kvm_slot_can_be_private(fault->slot) && > > > + (vcpu->kvm->arch.vm_type == KVM_X86_SNP_VM)) > > > + pr_warn("%s: private SEV TIO MMIO fault for fault->gfn=%llx\n", > > > + __func__, fault->gfn); > > > + else if (fault->is_private) > > > return kvm_faultin_pfn_private(vcpu, fault); > > > async = false; > > > -- > > > 2.45.2 > > > > > > > > -- > Alexey >