On Sat, 2021-01-30 at 16:58 +0000, Stefan Hajnoczi wrote: > On Thu, Jan 28, 2021 at 09:32:21PM +0300, Elena Afanasova wrote: > > The vCPU thread may receive a signal during ioregionfd > > communication, > > ioctl(KVM_RUN) needs to return to userspace and then ioctl(KVM_RUN) > > must resume ioregionfd. > > > > Signed-off-by: Elena Afanasova <eafanasova@xxxxxxxxx> > > --- > > Changes in v2: > > - add support for x86 signal handling > > - changes after code review > > > > arch/x86/kvm/x86.c | 196 > > +++++++++++++++++++++++++++++++--- > > include/linux/kvm_host.h | 13 +++ > > include/uapi/linux/ioregion.h | 32 ++++++ > > virt/kvm/ioregion.c | 177 +++++++++++++++++++++++++++++- > > virt/kvm/kvm_main.c | 16 ++- > > 5 files changed, 415 insertions(+), 19 deletions(-) > > create mode 100644 include/uapi/linux/ioregion.h > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index ddb28f5ca252..a04516b531da 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5799,19 +5799,33 @@ static int vcpu_mmio_write(struct kvm_vcpu > > *vcpu, gpa_t addr, int len, > > { > > int handled = 0; > > int n; > > + int ret = 0; > > + bool is_apic; > > > > do { > > n = min(len, 8); > > - if (!(lapic_in_kernel(vcpu) && > > - !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, > > addr, n, v)) > > - && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, > > v)) > > - break; > > + is_apic = lapic_in_kernel(vcpu) && > > + !kvm_iodevice_write(vcpu, &vcpu->arch.apic- > > >dev, > > + addr, n, v); > > + if (!is_apic) { > > + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, > > + addr, n, v); > > + if (ret) > > + break; > > + } > > handled += n; > > addr += n; > > len -= n; > > v += n; > > } while (len); > > > > +#ifdef CONFIG_KVM_IOREGION > > + if (ret == -EINTR) { > > + vcpu->run->exit_reason = KVM_EXIT_INTR; > > + ++vcpu->stat.signal_exits; > > + } > > +#endif > > + > > return handled; > > } > > There is a special case for crossing page boundaries: > 1. ioregion in the first 4 bytes (page 1) but not the second 4 bytes > (page 2). > 2. ioregion in the second 4 bytes (page 2) but not the first 4 bytes > (page 1). > 3. The first 4 bytes (page 1) in one ioregion and the second 4 bytes > (page 2) in another ioregion. > 4. The first 4 bytes (page 1) in one ioregion and the second 4 bytes > (page 2) in the same ioregion. > > Cases 3 and 4 are tricky. If I'm reading the code correctly we try > ioregion accesses twice, even if the first one returns -EINTR? > Yes, in the case of crossing a page boundary emulator_read_write_onepage() will be called twice. This case isn’t supported in the current code. Also I think that synchronization code for vcpu_mmio_write/read() is wrong. Probably kvm_io_bus_prepare/finish() should be called for every kvm_io_bus_read/write(). I'll try to fix that. > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 7cd667dddba9..5cfdecfca6db 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -318,6 +318,19 @@ struct kvm_vcpu { > > #endif > > bool preempted; > > bool ready; > > +#ifdef CONFIG_KVM_IOREGION > > + bool ioregion_interrupted; > > Can this field move into ioregion_ctx? > Yes > > + struct { > > + struct kvm_io_device *dev; > > + int pio; > > + void *val; > > + u8 state; > > + u64 addr; > > + int len; > > + u64 data; > > + bool in; > > + } ioregion_ctx; > > This struct can be reordered to remove holes between fields. > Ok, will do > > +#endif > > struct kvm_vcpu_arch arch; > > }; > > > > diff --git a/include/uapi/linux/ioregion.h > > b/include/uapi/linux/ioregion.h > > new file mode 100644 > > index 000000000000..7898c01f84a1 > > --- /dev/null > > +++ b/include/uapi/linux/ioregion.h > > @@ -0,0 +1,32 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > > To encourage people to implement the wire protocol even beyond the > Linux > syscall environment (e.g. in other hypervisors and VMMs) you could > make > the license more permissive: > > /* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) > OR BSD-3-Clause) */ > > Several other <linux/*.h> files do this so that the header can be > used > outside Linux without license concerns. > > Here is the BSD 3-Clause license: > https://opensource.org/licenses/BSD-3-Clause > > > +#ifndef _UAPI_LINUX_IOREGION_H > > +#define _UAPI_LINUX_IOREGION_H > > Please add the wire protocol specification/documentation into this > file. > That way this header file will serve as a comprehensive reference for > the protocol and changes to the header will also update the > documentation. > > (The ioctl KVM_SET_IOREGIONFD parts belong in > Documentation/virt/kvm/api.rst but the wire protocol should be in > this > header file instead.) > Ok > > + > > +/* Wire protocol */ > > +struct ioregionfd_cmd { > > + __u32 info; > > + __u32 padding; > > + __u64 user_data; > > + __u64 offset; > > + __u64 data; > > +}; > > + > > +struct ioregionfd_resp { > > + __u64 data; > > + __u8 pad[24]; > > +}; > > + > > +#define IOREGIONFD_CMD_READ 0 > > +#define IOREGIONFD_CMD_WRITE 1 > > + > > +#define IOREGIONFD_SIZE_8BIT 0 > > +#define IOREGIONFD_SIZE_16BIT 1 > > +#define IOREGIONFD_SIZE_32BIT 2 > > +#define IOREGIONFD_SIZE_64BIT 3 > > It's possible that larger read/write operations will be needed in the > future. For example, the PCI Express bus supports much larger > transactions than just 64 bits. > > You don't need to address this right now but I wanted to mention it. > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 88b92fc3da51..df387857f51f 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -4193,6 +4193,7 @@ static int __kvm_io_bus_write(struct kvm_vcpu > > *vcpu, struct kvm_io_bus *bus, > > struct kvm_io_range *range, const void > > *val) > > { > > int idx; > > + int ret = 0; > > > > idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len); > > if (idx < 0) > > @@ -4200,9 +4201,12 @@ static int __kvm_io_bus_write(struct > > kvm_vcpu *vcpu, struct kvm_io_bus *bus, > > > > while (idx < bus->dev_count && > > kvm_io_bus_cmp(range, &bus->range[idx]) == 0) { > > - if (!kvm_iodevice_write(vcpu, bus->range[idx].dev, > > range->addr, > > - range->len, val)) > > + ret = kvm_iodevice_write(vcpu, bus->range[idx].dev, > > range->addr, > > + range->len, val); > > + if (!ret) > > return idx; > > + if (ret < 0 && ret != -EOPNOTSUPP) > > + return ret; > > I audited all kvm_io_bus_read/write() callers to check that it's safe > to > add error return values besides -EOPNOTSUPP. Extending the meaning of > the return value is fine but any arches that want to support > ioregionfd > need to explicitly handle -EINTR return values now. Only x86 does > after > this patch.