On Sat, 2021-01-30 at 18:54 +0000, Stefan Hajnoczi wrote: > On Thu, Jan 28, 2021 at 09:32:22PM +0300, Elena Afanasova wrote: > > Add ioregionfd context and kvm_io_device_ops->prepare/finish() > > in order to serialize all bytes requested by guest. > > > > Signed-off-by: Elena Afanasova <eafanasova@xxxxxxxxx> > > --- > > arch/x86/kvm/x86.c | 19 ++++++++ > > include/kvm/iodev.h | 14 ++++++ > > include/linux/kvm_host.h | 4 ++ > > virt/kvm/ioregion.c | 102 +++++++++++++++++++++++++++++++++ > > ------ > > virt/kvm/kvm_main.c | 32 ++++++++++++ > > 5 files changed, 157 insertions(+), 14 deletions(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index a04516b531da..393fb0f4bf46 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5802,6 +5802,8 @@ static int vcpu_mmio_write(struct kvm_vcpu > > *vcpu, gpa_t addr, int len, > > int ret = 0; > > bool is_apic; > > > > + kvm_io_bus_prepare(vcpu, KVM_MMIO_BUS, addr, len); > > + > > do { > > n = min(len, 8); > > is_apic = lapic_in_kernel(vcpu) && > > @@ -5823,8 +5825,10 @@ static int vcpu_mmio_write(struct kvm_vcpu > > *vcpu, gpa_t addr, int len, > > if (ret == -EINTR) { > > vcpu->run->exit_reason = KVM_EXIT_INTR; > > ++vcpu->stat.signal_exits; > > + return handled; > > } > > #endif > > + kvm_io_bus_finish(vcpu, KVM_MMIO_BUS, addr, len); > > Hmm...it would be nice for kvm_io_bus_prepare() to return the idx or > the > device pointer so the devices don't need to be searched in > read/write/finish. However, it's complicated by the loop which may > access multiple devices. > Agree > > @@ -9309,6 +9325,7 @@ static int complete_ioregion_mmio(struct > > kvm_vcpu *vcpu) > > vcpu->mmio_cur_fragment++; > > } > > > > + vcpu->ioregion_ctx.dev->ops->finish(vcpu->ioregion_ctx.dev); > > vcpu->mmio_needed = 0; > > if (!vcpu->ioregion_ctx.in) { > > srcu_read_unlock(&vcpu->kvm->srcu, idx); > > @@ -9333,6 +9350,7 @@ static int complete_ioregion_pio(struct > > kvm_vcpu *vcpu) > > vcpu->ioregion_ctx.val += vcpu->ioregion_ctx.len; > > } > > > > + vcpu->ioregion_ctx.dev->ops->finish(vcpu->ioregion_ctx.dev); > > if (vcpu->ioregion_ctx.in) > > r = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE); > > srcu_read_unlock(&vcpu->kvm->srcu, idx); > > @@ -9352,6 +9370,7 @@ static int complete_ioregion_fast_pio(struct > > kvm_vcpu *vcpu) > > complete_ioregion_access(vcpu, vcpu->ioregion_ctx.addr, > > vcpu->ioregion_ctx.len, > > vcpu->ioregion_ctx.val); > > + vcpu->ioregion_ctx.dev->ops->finish(vcpu->ioregion_ctx.dev); > > srcu_read_unlock(&vcpu->kvm->srcu, idx); > > > > if (vcpu->ioregion_ctx.in) { > > Normally userspace will invoke ioctl(KVM_RUN) and reach one of these > completion functions, but what if the vcpu fd is closed instead? > ->finish() should still be called to avoid leaks. > Will fix > > diff --git a/include/kvm/iodev.h b/include/kvm/iodev.h > > index d75fc4365746..db8a3c69b7bb 100644 > > --- a/include/kvm/iodev.h > > +++ b/include/kvm/iodev.h > > @@ -25,6 +25,8 @@ struct kvm_io_device_ops { > > gpa_t addr, > > int len, > > const void *val); > > + void (*prepare)(struct kvm_io_device *this); > > + void (*finish)(struct kvm_io_device *this); > > void (*destructor)(struct kvm_io_device *this); > > }; > > > > @@ -55,6 +57,18 @@ static inline int kvm_iodevice_write(struct > > kvm_vcpu *vcpu, > > : -EOPNOTSUPP; > > } > > > > +static inline void kvm_iodevice_prepare(struct kvm_io_device *dev) > > +{ > > + if (dev->ops->prepare) > > + dev->ops->prepare(dev); > > +} > > + > > +static inline void kvm_iodevice_finish(struct kvm_io_device *dev) > > +{ > > + if (dev->ops->finish) > > + dev->ops->finish(dev); > > +} > > A performance optimization: keep a separate list of struct > kvm_io_devices that implement prepare/finish. That way the search > doesn't need to iterate over devices that don't support this > interface. > Thanks for the idea > Before implementing an optimization like this it would be good to > check > how this patch affects performance on guests with many in-kernel > devices > (e.g. a guest that has many multi-queue virtio-net/blk devices with > ioeventfd). ioregionfd shouldn't reduce performance of existing KVM > configurations, so it's worth measuring. > > > diff --git a/virt/kvm/ioregion.c b/virt/kvm/ioregion.c > > index da38124e1418..3474090ccc8c 100644 > > --- a/virt/kvm/ioregion.c > > +++ b/virt/kvm/ioregion.c > > @@ -1,6 +1,6 @@ > > // SPDX-License-Identifier: GPL-2.0-only > > #include <linux/kvm_host.h> > > -#include <linux/fs.h> > > +#include <linux/wait.h> > > #include <kvm/iodev.h> > > #include "eventfd.h" > > #include <uapi/linux/ioregion.h> > > @@ -12,15 +12,23 @@ kvm_ioregionfd_init(struct kvm *kvm) > > INIT_LIST_HEAD(&kvm->ioregions_pio); > > } > > > > +/* Serializes ioregionfd cmds/replies */ > > Please expand on this comment: > > ioregions that share the same rfd are serialized so that only one > vCPU > thread sends a struct ioregionfd_cmd to userspace at a time. This > ensures that the struct ioregionfd_resp received from userspace > will > be processed by the one and only vCPU thread that sent it. > > A waitqueue is used to wake up waiting vCPU threads in order. Most > of > the time the waitqueue is unused and the lock is not contended. > For best performance userspace should set up ioregionfds so that > there > is no contention (e.g. dedicated ioregionfds for queue doorbell > registers on multi-queue devices). > > A comment along these lines will give readers an idea of why the code > does this. Ok, thank you