Re: [RFC v2 2/4] KVM: x86: add support for ioregionfd signal handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2021-01-30 at 16:58 +0000, Stefan Hajnoczi wrote:
> On Thu, Jan 28, 2021 at 09:32:21PM +0300, Elena Afanasova wrote:
> > The vCPU thread may receive a signal during ioregionfd
> > communication,
> > ioctl(KVM_RUN) needs to return to userspace and then ioctl(KVM_RUN)
> > must resume ioregionfd.
> > 
> > Signed-off-by: Elena Afanasova <eafanasova@xxxxxxxxx>
> > ---
> > Changes in v2:
> >   - add support for x86 signal handling
> >   - changes after code review
> > 
> >  arch/x86/kvm/x86.c            | 196
> > +++++++++++++++++++++++++++++++---
> >  include/linux/kvm_host.h      |  13 +++
> >  include/uapi/linux/ioregion.h |  32 ++++++
> >  virt/kvm/ioregion.c           | 177 +++++++++++++++++++++++++++++-
> >  virt/kvm/kvm_main.c           |  16 ++-
> >  5 files changed, 415 insertions(+), 19 deletions(-)
> >  create mode 100644 include/uapi/linux/ioregion.h
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index ddb28f5ca252..a04516b531da 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5799,19 +5799,33 @@ static int vcpu_mmio_write(struct kvm_vcpu
> > *vcpu, gpa_t addr, int len,
> >  {
> >  	int handled = 0;
> >  	int n;
> > +	int ret = 0;
> > +	bool is_apic;
> >  
> >  	do {
> >  		n = min(len, 8);
> > -		if (!(lapic_in_kernel(vcpu) &&
> > -		      !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev,
> > addr, n, v))
> > -		    && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n,
> > v))
> > -			break;
> > +		is_apic = lapic_in_kernel(vcpu) &&
> > +			  !kvm_iodevice_write(vcpu, &vcpu->arch.apic-
> > >dev,
> > +					      addr, n, v);
> > +		if (!is_apic) {
> > +			ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS,
> > +					       addr, n, v);
> > +			if (ret)
> > +				break;
> > +		}
> >  		handled += n;
> >  		addr += n;
> >  		len -= n;
> >  		v += n;
> >  	} while (len);
> >  
> > +#ifdef CONFIG_KVM_IOREGION
> > +	if (ret == -EINTR) {
> > +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> > +		++vcpu->stat.signal_exits;
> > +	}
> > +#endif
> > +
> >  	return handled;
> >  }
> 
> There is a special case for crossing page boundaries:
> 1. ioregion in the first 4 bytes (page 1) but not the second 4 bytes
> (page 2).
> 2. ioregion in the second 4 bytes (page 2) but not the first 4 bytes
> (page 1).
> 3. The first 4 bytes (page 1) in one ioregion and the second 4 bytes
> (page 2) in another ioregion.
> 4. The first 4 bytes (page 1) in one ioregion and the second 4 bytes
> (page 2) in the same ioregion.
> 
> Cases 3 and 4 are tricky. If I'm reading the code correctly we try
> ioregion accesses twice, even if the first one returns -EINTR?
> 
Yes, in the case of crossing a page boundary
emulator_read_write_onepage() will be called twice. This case isn’t
supported in the current code. Also I think that synchronization code
for vcpu_mmio_write/read() is wrong. Probably
kvm_io_bus_prepare/finish() should be called for every
kvm_io_bus_read/write(). I'll try to fix that.

> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 7cd667dddba9..5cfdecfca6db 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -318,6 +318,19 @@ struct kvm_vcpu {
> >  #endif
> >  	bool preempted;
> >  	bool ready;
> > +#ifdef CONFIG_KVM_IOREGION
> > +	bool ioregion_interrupted;
> 
> Can this field move into ioregion_ctx?
> 
Yes

> > +	struct {
> > +		struct kvm_io_device *dev;
> > +		int pio;
> > +		void *val;
> > +		u8 state;
> > +		u64 addr;
> > +		int len;
> > +		u64 data;
> > +		bool in;
> > +	} ioregion_ctx;
> 
> This struct can be reordered to remove holes between fields.
> 
Ok, will do

> > +#endif
> >  	struct kvm_vcpu_arch arch;
> >  };
> >  
> > diff --git a/include/uapi/linux/ioregion.h
> > b/include/uapi/linux/ioregion.h
> > new file mode 100644
> > index 000000000000..7898c01f84a1
> > --- /dev/null
> > +++ b/include/uapi/linux/ioregion.h
> > @@ -0,0 +1,32 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> 
> To encourage people to implement the wire protocol even beyond the
> Linux
> syscall environment (e.g. in other hypervisors and VMMs) you could
> make
> the license more permissive:
> 
>   /* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) 
> OR BSD-3-Clause) */
> 
> Several other <linux/*.h> files do this so that the header can be
> used
> outside Linux without license concerns.
> 
> Here is the BSD 3-Clause license:
> https://opensource.org/licenses/BSD-3-Clause
> 
> > +#ifndef _UAPI_LINUX_IOREGION_H
> > +#define _UAPI_LINUX_IOREGION_H
> 
> Please add the wire protocol specification/documentation into this
> file.
> That way this header file will serve as a comprehensive reference for
> the protocol and changes to the header will also update the
> documentation.
> 
> (The ioctl KVM_SET_IOREGIONFD parts belong in
> Documentation/virt/kvm/api.rst but the wire protocol should be in
> this
> header file instead.)
> 
Ok

> > +
> > +/* Wire protocol */
> > +struct ioregionfd_cmd {
> > +	__u32 info;
> > +	__u32 padding;
> > +	__u64 user_data;
> > +	__u64 offset;
> > +	__u64 data;
> > +};
> > +
> > +struct ioregionfd_resp {
> > +	__u64 data;
> > +	__u8 pad[24];
> > +};
> > +
> > +#define IOREGIONFD_CMD_READ    0
> > +#define IOREGIONFD_CMD_WRITE   1
> > +
> > +#define IOREGIONFD_SIZE_8BIT   0
> > +#define IOREGIONFD_SIZE_16BIT  1
> > +#define IOREGIONFD_SIZE_32BIT  2
> > +#define IOREGIONFD_SIZE_64BIT  3
> 
> It's possible that larger read/write operations will be needed in the
> future. For example, the PCI Express bus supports much larger
> transactions than just 64 bits.
> 
> You don't need to address this right now but I wanted to mention it.
> 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 88b92fc3da51..df387857f51f 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4193,6 +4193,7 @@ static int __kvm_io_bus_write(struct kvm_vcpu
> > *vcpu, struct kvm_io_bus *bus,
> >  			      struct kvm_io_range *range, const void
> > *val)
> >  {
> >  	int idx;
> > +	int ret = 0;
> >  
> >  	idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len);
> >  	if (idx < 0)
> > @@ -4200,9 +4201,12 @@ static int __kvm_io_bus_write(struct
> > kvm_vcpu *vcpu, struct kvm_io_bus *bus,
> >  
> >  	while (idx < bus->dev_count &&
> >  		kvm_io_bus_cmp(range, &bus->range[idx]) == 0) {
> > -		if (!kvm_iodevice_write(vcpu, bus->range[idx].dev,
> > range->addr,
> > -					range->len, val))
> > +		ret = kvm_iodevice_write(vcpu, bus->range[idx].dev,
> > range->addr,
> > +					 range->len, val);
> > +		if (!ret)
> >  			return idx;
> > +		if (ret < 0 && ret != -EOPNOTSUPP)
> > +			return ret;
> 
> I audited all kvm_io_bus_read/write() callers to check that it's safe
> to
> add error return values besides -EOPNOTSUPP. Extending the meaning of
> the return value is fine but any arches that want to support
> ioregionfd
> need to explicitly handle -EINTR return values now. Only x86 does
> after
> this patch.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux