Re: [RFC v2 2/4] KVM: x86: add support for ioregionfd signal handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 28, 2021 at 09:32:21PM +0300, Elena Afanasova wrote:
> The vCPU thread may receive a signal during ioregionfd communication,
> ioctl(KVM_RUN) needs to return to userspace and then ioctl(KVM_RUN)
> must resume ioregionfd.
> 
> Signed-off-by: Elena Afanasova <eafanasova@xxxxxxxxx>
> ---
> Changes in v2:
>   - add support for x86 signal handling
>   - changes after code review
> 
>  arch/x86/kvm/x86.c            | 196 +++++++++++++++++++++++++++++++---
>  include/linux/kvm_host.h      |  13 +++
>  include/uapi/linux/ioregion.h |  32 ++++++
>  virt/kvm/ioregion.c           | 177 +++++++++++++++++++++++++++++-
>  virt/kvm/kvm_main.c           |  16 ++-
>  5 files changed, 415 insertions(+), 19 deletions(-)
>  create mode 100644 include/uapi/linux/ioregion.h
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ddb28f5ca252..a04516b531da 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5799,19 +5799,33 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
>  {
>  	int handled = 0;
>  	int n;
> +	int ret = 0;
> +	bool is_apic;
>  
>  	do {
>  		n = min(len, 8);
> -		if (!(lapic_in_kernel(vcpu) &&
> -		      !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v))
> -		    && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
> -			break;
> +		is_apic = lapic_in_kernel(vcpu) &&
> +			  !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev,
> +					      addr, n, v);
> +		if (!is_apic) {
> +			ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS,
> +					       addr, n, v);
> +			if (ret)
> +				break;
> +		}
>  		handled += n;
>  		addr += n;
>  		len -= n;
>  		v += n;
>  	} while (len);
>  
> +#ifdef CONFIG_KVM_IOREGION
> +	if (ret == -EINTR) {
> +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> +		++vcpu->stat.signal_exits;
> +	}
> +#endif
> +
>  	return handled;
>  }

There is a special case for crossing page boundaries:
1. ioregion in the first 4 bytes (page 1) but not the second 4 bytes (page 2).
2. ioregion in the second 4 bytes (page 2) but not the first 4 bytes (page 1).
3. The first 4 bytes (page 1) in one ioregion and the second 4 bytes (page 2) in another ioregion.
4. The first 4 bytes (page 1) in one ioregion and the second 4 bytes (page 2) in the same ioregion.

Cases 3 and 4 are tricky. If I'm reading the code correctly we try
ioregion accesses twice, even if the first one returns -EINTR?

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7cd667dddba9..5cfdecfca6db 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -318,6 +318,19 @@ struct kvm_vcpu {
>  #endif
>  	bool preempted;
>  	bool ready;
> +#ifdef CONFIG_KVM_IOREGION
> +	bool ioregion_interrupted;

Can this field move into ioregion_ctx?

> +	struct {
> +		struct kvm_io_device *dev;
> +		int pio;
> +		void *val;
> +		u8 state;
> +		u64 addr;
> +		int len;
> +		u64 data;
> +		bool in;
> +	} ioregion_ctx;

This struct can be reordered to remove holes between fields.

> +#endif
>  	struct kvm_vcpu_arch arch;
>  };
>  
> diff --git a/include/uapi/linux/ioregion.h b/include/uapi/linux/ioregion.h
> new file mode 100644
> index 000000000000..7898c01f84a1
> --- /dev/null
> +++ b/include/uapi/linux/ioregion.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */

To encourage people to implement the wire protocol even beyond the Linux
syscall environment (e.g. in other hypervisors and VMMs) you could make
the license more permissive:

  /* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) */

Several other <linux/*.h> files do this so that the header can be used
outside Linux without license concerns.

Here is the BSD 3-Clause license:
https://opensource.org/licenses/BSD-3-Clause

> +#ifndef _UAPI_LINUX_IOREGION_H
> +#define _UAPI_LINUX_IOREGION_H

Please add the wire protocol specification/documentation into this file.
That way this header file will serve as a comprehensive reference for
the protocol and changes to the header will also update the
documentation.

(The ioctl KVM_SET_IOREGIONFD parts belong in
Documentation/virt/kvm/api.rst but the wire protocol should be in this
header file instead.)

> +
> +/* Wire protocol */
> +struct ioregionfd_cmd {
> +	__u32 info;
> +	__u32 padding;
> +	__u64 user_data;
> +	__u64 offset;
> +	__u64 data;
> +};
> +
> +struct ioregionfd_resp {
> +	__u64 data;
> +	__u8 pad[24];
> +};
> +
> +#define IOREGIONFD_CMD_READ    0
> +#define IOREGIONFD_CMD_WRITE   1
> +
> +#define IOREGIONFD_SIZE_8BIT   0
> +#define IOREGIONFD_SIZE_16BIT  1
> +#define IOREGIONFD_SIZE_32BIT  2
> +#define IOREGIONFD_SIZE_64BIT  3

It's possible that larger read/write operations will be needed in the
future. For example, the PCI Express bus supports much larger
transactions than just 64 bits.

You don't need to address this right now but I wanted to mention it.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 88b92fc3da51..df387857f51f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4193,6 +4193,7 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
>  			      struct kvm_io_range *range, const void *val)
>  {
>  	int idx;
> +	int ret = 0;
>  
>  	idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len);
>  	if (idx < 0)
> @@ -4200,9 +4201,12 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
>  
>  	while (idx < bus->dev_count &&
>  		kvm_io_bus_cmp(range, &bus->range[idx]) == 0) {
> -		if (!kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr,
> -					range->len, val))
> +		ret = kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr,
> +					 range->len, val);
> +		if (!ret)
>  			return idx;
> +		if (ret < 0 && ret != -EOPNOTSUPP)
> +			return ret;

I audited all kvm_io_bus_read/write() callers to check that it's safe to
add error return values besides -EOPNOTSUPP. Extending the meaning of
the return value is fine but any arches that want to support ioregionfd
need to explicitly handle -EINTR return values now. Only x86 does after
this patch.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux