On Tue, Jun 25, 2024 at 02:54:09PM +0800, Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> wrote: > > > On 2/26/2024 4:26 PM, isaku.yamahata@xxxxxxxxx wrote: > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO > > hypercall to the KVM backend functions. > > > > kvm_io_bus_read/write() searches KVM device emulated in kernel of the given > > MMIO address and emulates the MMIO. As TDX PV MMIO also needs it, export > > kvm_io_bus_read(). kvm_io_bus_write() is already exported. TDX PV MMIO > > emulates some of MMIO itself. To add trace point consistently with x86 > > kvm, export kvm_mmio tracepoint. > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++ > > arch/x86/kvm/x86.c | 1 + > > virt/kvm/kvm_main.c | 2 + > > 3 files changed, 117 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index 55fc6cc6c816..389bb95d2af0 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -1217,6 +1217,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu) > > return ret; > > } > > +static int tdx_complete_mmio(struct kvm_vcpu *vcpu) > > +{ > > + unsigned long val = 0; > > + gpa_t gpa; > > + int size; > > + > > + KVM_BUG_ON(vcpu->mmio_needed != 1, vcpu->kvm); > > + vcpu->mmio_needed = 0; > mmio_needed is used by instruction emulator to setup the complete callback. > Since TDX handle MMIO in a PV way, mmio_needed is not needed here. Ok, we don't need to update mmio_needed. > > + > > + if (!vcpu->mmio_is_write) { > It's also needed by instruction emulator, we can use > vcpu->run->mmio.is_write instead. No because vcpu->run->mmio is shared with user space. KVM need to stash it independently. > > > + gpa = vcpu->mmio_fragments[0].gpa; > > + size = vcpu->mmio_fragments[0].len; > > Since MMIO cross page boundary is not allowed according to the input checks > from TDVMCALL, these mmio_fragments[] is not needed. > Just use vcpu->run->mmio.phys_addr and vcpu->run->mmio.len? ditto. > > + > > + memcpy(&val, vcpu->run->mmio.data, size); > > + tdvmcall_set_return_val(vcpu, val); > > + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); > > + } > > Tracepoint for KVM_TRACE_MMIO_WRITE is missing when it is handled in > userspace. tdx_mmio_write() has it before existing to the user space. It matches with how write_mmio() behaves in x86.c. Hmm, to match with other code, we should remove trace_kvm_mmio(KVM_TRACE_MMIO_READ) and keep KVM_TRACE_MMIO_READ_UNSATISFIED in tdx_emulate_mmio(). That's how read_prepare() and read_exit_mmio() behaves. For MMIO read - When kernel can handle the MMIO, KVM_TRACE_MMIO_READ with data. - When exiting to the user space, KVM_TRACE_MMIO_READ_UNSATISFIED before the exit. No trace after the user space handled the MMIO. For MMIO write - KVM_TRACE_MMIO_WRITE before handling it. > Also, the return code is only set when the emulation is done in kernel, but > not set when it's handled in userspace. > > > + return 1; > > +} > > How about the fixup as following: > > @@ -1173,19 +1173,18 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu) > static int tdx_complete_mmio(struct kvm_vcpu *vcpu) { unsigned long val = 0; > - gpa_t gpa; - int size; - - vcpu->mmio_needed = 0; - - if > (!vcpu->mmio_is_write) { - gpa = vcpu->mmio_fragments[0].gpa; - size = > vcpu->mmio_fragments[0].len; + gpa_t gpa = vcpu->run->mmio.phys_addr; + int > size = vcpu->run->mmio.len; + if (vcpu->run->mmio.is_write) { + > trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); + } else { > memcpy(&val, vcpu->run->mmio.data, size); tdvmcall_set_return_val(vcpu, > val); trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); } + + > tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); return 1; } > > > > > + > > +static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size, > > + unsigned long val) > > +{ > > + if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && > > + kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val)) > > + return -EOPNOTSUPP; > > + > > + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); > > + return 0; > > +} > > + > > +static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size) > > +{ > > + unsigned long val; > > + > > + if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && > > + kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val)) > > + return -EOPNOTSUPP; > > + > > + tdvmcall_set_return_val(vcpu, val); > > + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); > > + return 0; > > +} > > + > > +static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) > > +{ > > + struct kvm_memory_slot *slot; > > + int size, write, r; > > + unsigned long val; > > + gpa_t gpa; > > + > > + KVM_BUG_ON(vcpu->mmio_needed, vcpu->kvm); > > + > [...] > > + > > + /* Request the device emulation to userspace device model. */ > > + vcpu->mmio_needed = 1; > > + vcpu->mmio_is_write = write; > Then they can be dropped. We may drop mmio_needed. mmio_is_write is needed as above. > > + vcpu->arch.complete_userspace_io = tdx_complete_mmio; > > + > > + vcpu->run->mmio.phys_addr = gpa; > > + vcpu->run->mmio.len = size; > > + vcpu->run->mmio.is_write = write; > > + vcpu->run->exit_reason = KVM_EXIT_MMIO; > > + > > + if (write) { > > + memcpy(vcpu->run->mmio.data, &val, size); > > + } else { > > + vcpu->mmio_fragments[0].gpa = gpa; > > + vcpu->mmio_fragments[0].len = size; > These two lines can be dropped as well. ditto. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>