On Fri, 1 Dec 2017 10:38:07 +0100 Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote: > On 30/11/2017 19:30, Alex Williamson wrote: > > On Thu, 30 Nov 2017 16:11:35 +0100 > > Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote: > > > >> On 30/11/2017 15:08, Alex Williamson wrote: > >>> On Thu, 30 Nov 2017 12:34:38 +0100 > >>> Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote: > >>> > >>>> When userland VFIO defines a new IOMMU for a guest it may > >>>> want to specify to the guest the physical limits of > >>>> the underlying host IOMMU to avoid access to forbidden > >>>> memory ranges. > >>>> > >>>> Currently, the vfio_iommu_type1 driver does not report this > >>>> information to userland. > >>>> > >>>> Let's extend the vfio_iommu_type1_info structure reported > >>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the > >>>> IOMMU limits as new uint64_t entries aperture_start and > >>>> aperture_end. > >>>> > >>>> Let's also extend the flags bit map to add a flag specifying > >>>> if this extension of the info structure is reported or not. > >>>> > >>>> Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> > >>>> --- > >>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++ > >>>> include/uapi/linux/vfio.h | 3 +++ > >>>> 2 files changed, 45 insertions(+) > >>>> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >>>> index 8549cb1..7da5fe0 100644 > >>>> --- a/drivers/vfio/vfio_iommu_type1.c > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) > >>>> return ret; > >>>> } > >>>> > >>>> +/** > >>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu > >>>> + * @iommu: the current vfio_iommu > >>>> + * @start: a pointer to the aperture start > >>>> + * @end : a pointer to the aperture end > >>>> + * > >>>> + * This function iterate on the domains using the given vfio_iommu > >>>> + * and restrict the aperture to the minimal aperture common > >>>> + * to all domains sharing this vfio_iommu. > >>>> + */ > >>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start, > >>>> + uint64_t *end) > >>>> +{ > >>>> + struct iommu_domain_geometry geometry; > >>>> + struct vfio_domain *domain; > >>>> + > >>>> + *start = 0; > >>>> + *end = U64_MAX; > >>>> + > >>>> + mutex_lock(&iommu->lock); > >>>> + /* loop on all domains using this vfio_iommu */ > >>>> + list_for_each_entry(domain, &iommu->domain_list, next) { > >>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, > >>>> + &geometry); > >>>> + if (geometry.force_aperture) { > >>>> + if (geometry.aperture_start > *start) > >>>> + *start = geometry.aperture_start; > >>>> + if (geometry.aperture_end < *end) > >>>> + *end = geometry.aperture_end; > >>>> + } > >>>> + } > >>>> + mutex_unlock(&iommu->lock); > >>>> +} > >>>> + > >>>> static long vfio_iommu_type1_ioctl(void *iommu_data, > >>>> unsigned int cmd, unsigned long arg) > >>>> { > >>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > >>>> > >>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu); > >>>> > >>>> + minsz = min_t(size_t, info.argsz, sizeof(info)); > >>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info, > >>>> + aperture_end)) { > >>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE; > >>>> + vfio_get_aperture(iommu, &info.aperture_start, > >>>> + &info.aperture_end); > >>>> + } > >>>> + > >>>> return copy_to_user((void __user *)arg, &info, minsz) ? > >>>> -EFAULT : 0; > >>>> > >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >>>> index 0fb25fb..780d909 100644 > >>>> --- a/include/uapi/linux/vfio.h > >>>> +++ b/include/uapi/linux/vfio.h > >>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info { > >>>> __u32 flags; > >>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ > >>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */ > >>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */ > >>>> + __u64 aperture_start; /* start of DMA aperture */ > >>>> + __u64 aperture_end; /* end of DMA aperture */ > >>>> }; > >>>> > >>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) > >>> > >>> This only supports the most simple topology, even x86 cannot claim to > >>> have a single contiguous aperture, it's typically bisected by an MSI > >>> window. I think we need an API that supports one or more apertures > >>> out of the box. Also as Eric indicates, a capability is probably the > >>> better option for creating a flexible structure. Thanks, > >>> > >>> Alex > >>> > >> > >> > >> Yes, I understand that a capability here is a must, I will follow this way. > >> > >> For having multiple aperture and MSI protection, I understood it was > >> done using windows and reserved regions. > >> Can you point me to my error? > > > > See the thread from Huawei, I don't think that's a solved problem: > > > > https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html > > > > If you want sysfs to be consumed separately by the user and fed into > > new QEMU command line options for creating a VM layout, perhaps that's > > sufficient, but I think the vfio api for the iommu should encompass > > describing available ranges of mappable iova space without cobbling > > together arbitrary info from sysfs. Thanks, > > > > Alex > > > > Hi Alex, > > I resume to see if I understood you well: > > We may have physical IOMMUs with a more complex access that can not be > specified by only defining the start and end of a read/write region. > > Windows can be used to reserve regions for the VM but it is not what we > want. What we want is to know what the host can offer which is a mix of > aperture and windows. > > To report this we can use capabilities in a positive way, describing > what the host offers not what it can not provide. > > To achieve this we have to use two interfaces: > - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities > - Physical IOMMU interface with both geometry and window iommu_ops > callbacks. > > If it is sufficiently near from what you thought I will provide a new > version in this direction. I believe so. VFIO would construct a set of mappable iova regions/windows using information provided via the IOMMU API via iommu_ops and expose this via a new capability supporting multiple such regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be extended to support capabilities in the same way we've done so for other vfio ioctls. Thanks, Alex