On Fri, Mar 11, 2016 at 10:09:50AM +1100, Alexey Kardashevskiy wrote: > On 03/10/2016 04:21 PM, David Gibson wrote: > >On Wed, Mar 09, 2016 at 08:20:12PM +1100, Alexey Kardashevskiy wrote: > >>On 03/09/2016 04:45 PM, David Gibson wrote: > >>>On Mon, Mar 07, 2016 at 02:41:17PM +1100, Alexey Kardashevskiy wrote: > >>>>sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap > >>>>via hypercalls which take a logical bus id (LIOBN) as a target IOMMU > >>>>identifier. LIOBNs are made up, advertised to guest systems and > >>>>linked to IOMMU groups by the user space. > >>>>In order to enable acceleration for IOMMU operations in KVM, we need > >>>>to tell KVM the information about the LIOBN-to-group mapping. > >>>> > >>>>For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter > >>>>is added which accepts: > >>>>- a VFIO group fd and IO base address to find the actual hardware > >>>>TCE table; > >>>>- a LIOBN to assign to the found table. > >>>> > >>>>Before notifying KVM about new link, this check the group for being > >>>>registered with KVM device in order to release them at unexpected KVM > >>>>finish. > >>>> > >>>>This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user > >>>>space. > >>>> > >>>>While we are here, this also fixes VFIO KVM device compiling to let it > >>>>link to a KVM module. > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> > >>>>--- > >>>> Documentation/virtual/kvm/devices/vfio.txt | 21 +++++- > >>>> arch/powerpc/kvm/Kconfig | 1 + > >>>> arch/powerpc/kvm/Makefile | 5 +- > >>>> arch/powerpc/kvm/powerpc.c | 1 + > >>>> include/uapi/linux/kvm.h | 9 +++ > >>>> virt/kvm/vfio.c | 106 +++++++++++++++++++++++++++++ > >>>> 6 files changed, 140 insertions(+), 3 deletions(-) > >>>> > >>>>diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt > >>>>index ef51740..c0d3eb7 100644 > >>>>--- a/Documentation/virtual/kvm/devices/vfio.txt > >>>>+++ b/Documentation/virtual/kvm/devices/vfio.txt > >>>>@@ -16,7 +16,24 @@ Groups: > >>>> > >>>> KVM_DEV_VFIO_GROUP attributes: > >>>> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking > >>>>+ kvm_device_attr.addr points to an int32_t file descriptor > >>>>+ for the VFIO group. > >>> > >>>AFAICT these changes are accurate for VFIO as it is already, in which > >>>case it might be clearer to put them in a separate patch. > >>> > >>>> KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking > >>>>+ kvm_device_attr.addr points to an int32_t file descriptor > >>>>+ for the VFIO group. > >>>> > >>>>-For each, kvm_device_attr.addr points to an int32_t file descriptor > >>>>-for the VFIO group. > >>>>+ KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO group > >>>>+ kvm_device_attr.addr points to a struct: > >>>>+ struct kvm_vfio_spapr_tce_liobn { > >>>>+ __u32 argsz; > >>>>+ __s32 fd; > >>>>+ __u32 liobn; > >>>>+ __u8 pad[4]; > >>>>+ __u64 start_addr; > >>>>+ }; > >>>>+ where > >>>>+ @argsz is the size of kvm_vfio_spapr_tce_liobn; > >>>>+ @fd is a file descriptor for a VFIO group; > >>>>+ @liobn is a logical bus id to be associated with the group; > >>>>+ @start_addr is a DMA window offset on the IO (PCI) bus > >>> > >>>For the cause of DDW and multiple windows, I'm assuming you can call > >>>this multiple times with different LIOBNs and the same IOMMU group? > >> > >> > >>Yes. It is called twice per each group (when DDW is activated) - for 32bit > >>and 64bit windows, this is why @start_addr is there. > >> > >> > >>>>diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > >>>>index 1059846..dfa3488 100644 > >>>>--- a/arch/powerpc/kvm/Kconfig > >>>>+++ b/arch/powerpc/kvm/Kconfig > >>>>@@ -65,6 +65,7 @@ config KVM_BOOK3S_64 > >>>> select KVM > >>>> select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE > >>>> select SPAPR_TCE_IOMMU if IOMMU_SUPPORT > >>>>+ select KVM_VFIO if VFIO > >>>> ---help--- > >>>> Support running unmodified book3s_64 and book3s_32 guest kernels > >>>> in virtual machines on book3s_64 host processors. > >>>>diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile > >>>>index 7f7b6d8..71f577c 100644 > >>>>--- a/arch/powerpc/kvm/Makefile > >>>>+++ b/arch/powerpc/kvm/Makefile > >>>>@@ -8,7 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm > >>>> KVM := ../../../virt/kvm > >>>> > >>>> common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ > >>>>- $(KVM)/eventfd.o $(KVM)/vfio.o > >>>>+ $(KVM)/eventfd.o > >>> > >>>Please don't disable the VFIO device for the non-book3s case. I added > >>>it (even though it didn't do anything until now) so that libvirt > >>>wouldn't choke when it finds it's not available. Obviously the new > >>>ioctl needs to be only for the right IOMMU setup, but the device > >>>itself should be available always. > >> > >>Ah. Ok, I'll fix this. I just wanted to be able to compile kvm as a module. > >> > >> > >>>> CFLAGS_e500_mmu.o := -I. > >>>> CFLAGS_e500_mmu_host.o := -I. > >>>>@@ -87,6 +87,9 @@ endif > >>>> kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \ > >>>> book3s_xics.o > >>>> > >>>>+kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) += \ > >>>>+ $(KVM)/vfio.o \ > >>>>+ > >>>> kvm-book3s_64-module-objs += \ > >>>> $(KVM)/kvm_main.o \ > >>>> $(KVM)/eventfd.o \ > >>>>diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > >>>>index 19aa59b..63f188d 100644 > >>>>--- a/arch/powerpc/kvm/powerpc.c > >>>>+++ b/arch/powerpc/kvm/powerpc.c > >>>>@@ -521,6 +521,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > >>>> #ifdef CONFIG_PPC_BOOK3S_64 > >>>> case KVM_CAP_SPAPR_TCE: > >>>> case KVM_CAP_SPAPR_TCE_64: > >>>>+ case KVM_CAP_SPAPR_TCE_VFIO: > >>>> case KVM_CAP_PPC_ALLOC_HTAB: > >>>> case KVM_CAP_PPC_RTAS: > >>>> case KVM_CAP_PPC_FIXUP_HCALL: > >>>>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >>>>index 080ffbf..f1abbea 100644 > >>>>--- a/include/uapi/linux/kvm.h > >>>>+++ b/include/uapi/linux/kvm.h > >>>>@@ -1056,6 +1056,7 @@ struct kvm_device_attr { > >>>> #define KVM_DEV_VFIO_GROUP 1 > >>>> #define KVM_DEV_VFIO_GROUP_ADD 1 > >>>> #define KVM_DEV_VFIO_GROUP_DEL 2 > >>>>+#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN 3 > >>>> > >>>> enum kvm_device_type { > >>>> KVM_DEV_TYPE_FSL_MPIC_20 = 1, > >>>>@@ -1075,6 +1076,14 @@ enum kvm_device_type { > >>>> KVM_DEV_TYPE_MAX, > >>>> }; > >>>> > >>>>+struct kvm_vfio_spapr_tce_liobn { > >>>>+ __u32 argsz; > >>>>+ __s32 fd; > >>>>+ __u32 liobn; > >>>>+ __u8 pad[4]; > >>>>+ __u64 start_addr; > >>>>+}; > >>>>+ > >>>> /* > >>>> * ioctls for VM fds > >>>> */ > >>>>diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > >>>>index 1dd087d..87c771e 100644 > >>>>--- a/virt/kvm/vfio.c > >>>>+++ b/virt/kvm/vfio.c > >>>>@@ -20,6 +20,10 @@ > >>>> #include <linux/vfio.h> > >>>> #include "vfio.h" > >>>> > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+#include <asm/kvm_ppc.h> > >>>>+#endif > >>>>+ > >>>> struct kvm_vfio_group { > >>>> struct list_head node; > >>>> struct vfio_group *vfio_group; > >>>>@@ -60,6 +64,22 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group) > >>>> symbol_put(vfio_group_put_external_user); > >>>> } > >>>> > >>>>+static int kvm_vfio_external_user_iommu_id(struct vfio_group *vfio_group) > >>>>+{ > >>>>+ int (*fn)(struct vfio_group *); > >>>>+ int ret = -1; > >>> > >>>Should this be -ESOMETHING? > >>> > >>>>+ fn = symbol_get(vfio_external_user_iommu_id); > >>>>+ if (!fn) > >>>>+ return ret; > >>>>+ > >>>>+ ret = fn(vfio_group); > >>>>+ > >>>>+ symbol_put(vfio_external_user_iommu_id); > >>>>+ > >>>>+ return ret; > >>>>+} > >>>>+ > >>>> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) > >>>> { > >>>> long (*fn)(struct vfio_group *, unsigned long); > >>>>@@ -110,6 +130,22 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev) > >>>> mutex_unlock(&kv->lock); > >>>> } > >>>> > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+static void kvm_vfio_spapr_detach_iommu_group(struct kvm *kvm, > >>>>+ struct vfio_group *vfio_group) > >>> > >>> > >>>Shouldn't this go in the same patch that introduced the attach > >>>function? > >> > >>Having less patches which touch different maintainers areas is better. I > >>cannot avoid touching both PPC KVM and VFIO in this patch but I can in > >>"[PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE > >>table". > >> > >> > >>> > >>>>+{ > >>>>+ int group_id; > >>>>+ struct iommu_group *grp; > >>>>+ > >>>>+ group_id = kvm_vfio_external_user_iommu_id(vfio_group); > >>>>+ grp = iommu_group_get_by_id(group_id); > >>>>+ if (grp) { > >>>>+ kvm_spapr_tce_detach_iommu_group(kvm, grp); > >>>>+ iommu_group_put(grp); > >>>>+ } > >>>>+} > >>>>+#endif > >>>>+ > >>>> static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) > >>>> { > >>>> struct kvm_vfio *kv = dev->private; > >>>>@@ -186,6 +222,10 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) > >>>> continue; > >>>> > >>>> list_del(&kvg->node); > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>> > >>>Better to make a no-op version of the call than have to #ifdef at the > >>>callsite. > >> > >>It is questionable. A x86 reader may decide that > >>KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN is implemented for x86 and get > >>confused. > >> > >> > >>> > >>>>+ kvm_vfio_spapr_detach_iommu_group(dev->kvm, > >>>>+ kvg->vfio_group); > >>>>+#endif > >>>> kvm_vfio_group_put_external_user(kvg->vfio_group); > >>>> kfree(kvg); > >>>> ret = 0; > >>>>@@ -201,6 +241,69 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) > >>>> kvm_vfio_update_coherency(dev); > >>>> > >>>> return ret; > >>>>+ > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+ case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: { > >>>>+ struct kvm_vfio_spapr_tce_liobn param; > >>>>+ unsigned long minsz; > >>>>+ struct kvm_vfio *kv = dev->private; > >>>>+ struct vfio_group *vfio_group; > >>>>+ struct kvm_vfio_group *kvg; > >>>>+ struct fd f; > >>>>+ > >>>>+ minsz = offsetofend(struct kvm_vfio_spapr_tce_liobn, > >>>>+ start_addr); > >>>>+ > >>>>+ if (copy_from_user(¶m, (void __user *)arg, minsz)) > >>>>+ return -EFAULT; > >>>>+ > >>>>+ if (param.argsz < minsz) > >>>>+ return -EINVAL; > >>>>+ > >>>>+ f = fdget(param.fd); > >>>>+ if (!f.file) > >>>>+ return -EBADF; > >>>>+ > >>>>+ vfio_group = kvm_vfio_group_get_external_user(f.file); > >>>>+ fdput(f); > >>>>+ > >>>>+ if (IS_ERR(vfio_group)) > >>>>+ return PTR_ERR(vfio_group); > >>>>+ > >>>>+ ret = -ENOENT; > >>> > >>>Shouldn't there be some runtime test for the type of the IOMMU? It's > >>>possible a kernel could be built for a platform supporting multiple > >>>IOMMU types. > >> > >>Well, may make sense but I do not know to test that. The IOMMU type is a > >>VFIO container property, not a group property and here (KVM) we only have > >>groups. > > > >Which, as mentioned previously, is broken. > > Which I am failing to follow you on this. > > What I am trying to achieve here is pretty much referencing a group so it > cannot be reused. Plus LIOBNs. "Plus LIOBNs" is not a trivial change. You are establishing a linkage from LIOBNs to groups. But that doesn't make sense; if mapping in one (guest) LIOBN affects a group it must affect all groups in the container. i.e. LIOBN->container is the natural mapping, *not* LIOBN to group. > Passing a container fd does not make much > sense here as the VFIO device would walk through groups, reference them and > that is it, there is no locking on VFIO containters and so far there was no > need to teach KVM about containers. > > What do I miss now? Referencing the groups is essentially just a useful side effect. The important functionality is informing VFIO of the guest LIOBNs; and LIOBNs map to containers, not groups. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature