Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
> and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
> them to user space which saves time on switching to user space and back.
> 
> Both real and virtual modes are supported. The kernel tries to
> handle a TCE request in the real mode, if fails it passes the request
> to the virtual mode to complete the operation. If it a virtual mode
> handler fails, the request is passed to user space.
> 
> The first user of this is VFIO on POWER. Trampolines to the VFIO external
> user API functions are required for this patch.
> 
> This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus
> number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
> of map/unmap requests. The device supports a single attribute which is
> a struct with LIOBN and IOMMU fd. When the attribute is set, the device
> establishes the connection between KVM and VFIO.
> 
> Tests show that this patch increases transmission speed from 220MB/s
> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
> 
> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx>
> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> 
> ---
> 
> Changes:
> v9:
> * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU"
> KVM device
> * release_spapr_tce_table() is not shared between different TCE types
> * reduced the patch size by moving VFIO external API
> trampolines to separate patche
> * moved documentation from Documentation/virtual/kvm/api.txt to
> Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> 
> v8:
> * fixed warnings from check_patch.pl
> 
> 2013/07/11:
> * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
> for KVM_BOOK3S_64
> * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
> for this here but the next patch for hugepages support will use it more.
> 
> 2013/07/06:
> * added realmode arch_spin_lock to protect TCE table from races
> in real and virtual modes
> * POWERPC IOMMU API is changed to support real mode
> * iommu_take_ownership and iommu_release_ownership are protected by
> iommu_table's locks
> * VFIO external user API use rewritten
> * multiple small fixes
> 
> 2013/06/27:
> * tce_list page is referenced now in order to protect it from accident
> invalidation during H_PUT_TCE_INDIRECT execution
> * added use of the external user VFIO API
> 
> 2013/06/05:
> * changed capability number
> * changed ioctl number
> * update the doc article number
> 
> 2013/05/20:
> * removed get_user() from real mode handlers
> * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
> translated TCEs, tries realmode_get_page() on those and if it fails, it
> passes control over the virtual mode handler which tries to finish
> the request handling
> * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
> on a page
> * The only reason to pass the request to user mode now is when the user mode
> did not register TCE table in the kernel, in all other cases the virtual mode
> handler is expected to do the job
> ---
>  .../virtual/kvm/devices/spapr_tce_iommu.txt        |  37 +++
>  arch/powerpc/include/asm/kvm_host.h                |   4 +
>  arch/powerpc/kvm/book3s_64_vio.c                   | 310 ++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c                | 122 ++++++++
>  arch/powerpc/kvm/powerpc.c                         |   1 +
>  include/linux/kvm_host.h                           |   1 +
>  virt/kvm/kvm_main.c                                |   5 +
>  7 files changed, 477 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> 
> diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> new file mode 100644
> index 0000000..4bc8fc3
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
> @@ -0,0 +1,37 @@
> +SPAPR TCE IOMMU device
> +
> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
> +Architectures: powerpc
> +
> +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
> +
> +Groups:
> +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
> +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
> +
> +This is completely made up device which provides API to link
> +logical bus number (LIOBN) and IOMMU group. The user space has
> +to create a new SPAPR TCE IOMMU device per a logical bus.
> +
Why not have one device that can handle multimple links?

> +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
> +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
> +IOMMU group is a minimal isolated device set which can be passed to
> +the user space via VFIO.
> +
> +Right after creation the device is in uninitlized state and requires
> +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set.
> +The attribute contains liobn, IOMMU fd and flags:
> +
> +struct kvm_create_spapr_tce_iommu_linkage {
> +	__u64 liobn;
> +	__u32 fd;
> +	__u32 flags;
> +};
> +
> +The user space creates the SPAPR TCE IOMMU device, obtains
> +an IOMMU fd via VFIO ABI and sets the attribute to the SPAPR TCE IOMMU
> +device. At the moment of setting the attribute, the SPAPR TCE IOMMU
> +device links LIOBN to IOMMU group and makes necessary steps
> +to make sure that VFIO group will not disappear before KVM destroys.
> +
> +The kernel advertises this feature via KVM_CAP_SPAPR_TCE_IOMMU capability.
[skip]

> +
> +static int kvmppc_spapr_tce_iommu_get_attr(struct kvm_device *dev,
> +		struct kvm_device_attr *attr)
> +{
> +	struct kvmppc_spapr_tce_table *tt = dev->private;
> +	void __user *argp = (void __user *) attr->addr;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE:
> +		if (!tt)
> +			return -EFAULT;
Does not look like correct error code to return here. EINVAL may be?

> +		if (copy_to_user(&tt->link, argp, sizeof(tt->link)))
> +			return -EFAULT;
> +		return 0;
> +	}
> +	return -ENXIO;
> +}
> +

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]