Hi Alex, On 06/10/2016 22:42, Alex Williamson wrote: > On Thu, 6 Oct 2016 14:20:40 -0600 > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > >> On Thu, 6 Oct 2016 08:45:31 +0000 >> Eric Auger <eric.auger@xxxxxxxxxx> wrote: >> >>> This patch allows the user-space to retrieve the MSI geometry. The >>> implementation is based on capability chains, now also added to >>> VFIO_IOMMU_GET_INFO. >>> >>> The returned info comprise: >>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and >>> in the positive, the start/end of the aperture, >>> - or whether the IOVA aperture need to be set by the userspace. In that >>> case, the size and alignment of the IOVA window to be provided are >>> returned. >>> >>> In case the userspace must provide the IOVA aperture, we currently report >>> a size/alignment based on all the doorbells registered by the host kernel. >>> This may exceed the actual needs. >>> >>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>> >>> --- >>> v11 -> v11: >>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages >>> >>> v9 -> v10: >>> - move cap_offset after iova_pgsizes >>> - replace __u64 alignment by __u32 order >>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and >>> fix alignment >>> - call msi-doorbell API to compute the size/alignment >>> >>> v8 -> v9: >>> - use iommu_msi_supported flag instead of programmable >>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated >>> capability chain, reporting the MSI geometry >>> >>> v7 -> v8: >>> - use iommu_domain_msi_geometry >>> >>> v6 -> v7: >>> - remove the computation of the number of IOVA pages to be provisionned. >>> This number depends on the domain/group/device topology which can >>> dynamically change. Let's rely instead rely on an arbitrary max depending >>> on the system >>> >>> v4 -> v5: >>> - move msi_info and ret declaration within the conditional code >>> >>> v3 -> v4: >>> - replace former vfio_domains_require_msi_mapping by >>> more complex computation of MSI mapping requirements, especially the >>> number of pages to be provided by the user-space. >>> - reword patch title >>> >>> RFC v1 -> v1: >>> - derived from >>> [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state >>> - renamed allow_msi_reconfig into require_msi_mapping >>> - fixed VFIO_IOMMU_GET_INFO >>> --- >>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++- >>> include/uapi/linux/vfio.h | 32 ++++++++++++++++- >>> 2 files changed, 108 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>> index dc3ee5d..ce5e7eb 100644 >>> --- a/drivers/vfio/vfio_iommu_type1.c >>> +++ b/drivers/vfio/vfio_iommu_type1.c >>> @@ -38,6 +38,8 @@ >>> #include <linux/workqueue.h> >>> #include <linux/dma-iommu.h> >>> #include <linux/msi-doorbell.h> >>> +#include <linux/irqdomain.h> >>> +#include <linux/msi.h> >>> >>> #define DRIVER_VERSION "0.2" >>> #define DRIVER_AUTHOR "Alex Williamson <alex.williamson@xxxxxxxxxx>" >>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) >>> return ret; >>> } >>> >>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu, >>> + struct vfio_info_cap *caps) >>> +{ >>> + struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry; >>> + unsigned long order = __ffs(vfio_pgsize_bitmap(iommu)); >>> + struct iommu_domain_msi_geometry msi_geometry; >>> + struct vfio_info_cap_header *header; >>> + struct vfio_domain *d; >>> + bool reserved; >>> + size_t size; >>> + >>> + mutex_lock(&iommu->lock); >>> + /* All domains have same require_msi_map property, pick first */ >>> + d = list_first_entry(&iommu->domain_list, struct vfio_domain, next); >>> + iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY, >>> + &msi_geometry); >>> + reserved = !msi_geometry.iommu_msi_supported; >>> + >>> + mutex_unlock(&iommu->lock); >>> + >>> + size = sizeof(*vfio_msi_geometry); >>> + header = vfio_info_cap_add(caps, size, >>> + VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1); >>> + >>> + if (IS_ERR(header)) >>> + return PTR_ERR(header); >>> + >>> + vfio_msi_geometry = container_of(header, >>> + struct vfio_iommu_type1_info_cap_msi_geometry, >>> + header); >>> + >>> + vfio_msi_geometry->flags = reserved; >> >> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED >> >>> + if (reserved) { >>> + vfio_msi_geometry->aperture_start = msi_geometry.aperture_start; >>> + vfio_msi_geometry->aperture_end = msi_geometry.aperture_end; >> >> But maybe nobody has set these, did you intend to use >> iommu_domain_msi_aperture_valid(), which you defined early on but never >> used? >> >>> + return 0; >>> + } >>> + >>> + vfio_msi_geometry->order = order; >> >> I'm tempted to suggest that a user could do the same math on their own >> since we provide the supported bitmap already... could it ever not be >> the same? >> >>> + /* >>> + * we compute a system-wide requirement based on all the registered >>> + * doorbells >>> + */ >>> + vfio_msi_geometry->size = >>> + msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order); >>> + >>> + return 0; >>> +} >>> + >>> static long vfio_iommu_type1_ioctl(void *iommu_data, >>> unsigned int cmd, unsigned long arg) >>> { >>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, >>> } >>> } else if (cmd == VFIO_IOMMU_GET_INFO) { >>> struct vfio_iommu_type1_info info; >>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; >>> + int ret; >>> >>> - minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes); >>> + minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset); >>> >>> if (copy_from_user(&info, (void __user *)arg, minsz)) >>> return -EFAULT; >>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, >>> >>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu); >>> >>> + ret = compute_msi_geometry_caps(iommu, &caps); >>> + if (ret) >>> + return ret; >>> + >>> + if (caps.size) { >>> + info.flags |= VFIO_IOMMU_INFO_CAPS; >>> + if (info.argsz < sizeof(info) + caps.size) { >>> + info.argsz = sizeof(info) + caps.size; >>> + info.cap_offset = 0; >>> + } else { >>> + vfio_info_cap_shift(&caps, sizeof(info)); >>> + if (copy_to_user((void __user *)arg + >>> + sizeof(info), caps.buf, >>> + caps.size)) { >>> + kfree(caps.buf); >>> + return -EFAULT; >>> + } >>> + info.cap_offset = sizeof(info); >>> + } >>> + >>> + kfree(caps.buf); >>> + } >>> + >>> return copy_to_user((void __user *)arg, &info, minsz) ? >>> -EFAULT : 0; >>> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>> index 4a9dbc2..8dae013 100644 >>> --- a/include/uapi/linux/vfio.h >>> +++ b/include/uapi/linux/vfio.h >>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info { >>> __u32 argsz; >>> __u32 flags; >>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ >>> - __u64 iova_pgsizes; /* Bitmap of supported page sizes */ >>> +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ >>> + __u64 iova_pgsizes; /* Bitmap of supported page sizes */ >>> + __u32 __resv; >>> + __u32 cap_offset; /* Offset within info struct of first cap */ >>> +}; >> >> I understand the padding, but not the ordering. Why not end with >> padding? >> >>> + >>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY 1 >>> + >>> +/* >>> + * The MSI geometry capability allows to report the MSI IOVA geometry: >>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture >>> + * whose boundaries are given by [@aperture_start, @aperture_end]. >>> + * this is typically the case on x86 host. The userspace is not allowed >>> + * to map userspace memory at IOVAs intersecting this range using >>> + * VFIO_IOMMU_MAP_DMA. >>> + * - or the MSI IOVAs are not requested to belong to any reserved range; >>> + * in that case the userspace must provide an IOVA window characterized by >>> + * @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag. >>> + */ >>> +struct vfio_iommu_type1_info_cap_msi_geometry { >>> + struct vfio_info_cap_header header; >>> + __u32 flags; >>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */ >>> + /* not reserved */ >>> + __u32 order; /* iommu page order used for aperture alignment*/ >>> + __u64 size; /* IOVA aperture size (bytes) the userspace must provide */ >>> + /* reserved */ >>> + __u64 aperture_start; >>> + __u64 aperture_end; >> >> Should these be a union? We never set them both. Should the !reserved >> case have a flag as well, so the user can positively identify what's >> being provided? > > Actually, is there really any need to fit both of these within the same > structure? Part of the idea of the capability chains is we can create > a capability for each new thing we want to describe. So, we could > simply define a generic reserved IOVA range capability with a 'start' > and 'end' and then another capability to define MSI mapping > requirements. Thanks, Yes your suggested approach makes sense to me. One reason why I proceeded that way is we are mixing things at iommu.h level too. Personally I would have preferred to separate things: 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap 2) rename iommu_msi_supported into "programmable" bool: reporting whether the aperture is reserved or programmable. In the early releases I think it was as above but slightly we moved to a mixed description. What do you think? Thank you for the whole review! Eric > > Alex > >>> }; >>> >>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) >>> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info { >>> * IOVA region that will be used on some platforms to map the host MSI frames. >>> * In that specific case, vaddr is ignored. Once registered, an MSI reserved >>> * IOVA region stays until the container is closed. >>> + * The requirement for provisioning such reserved IOVA range can be checked by >>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability. >>> */ >>> struct vfio_iommu_type1_dma_map { >>> __u32 argsz; >> > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html