Hi Alex, On 1/30/19 12:16 AM, Alex Williamson wrote: > On Fri, 25 Jan 2019 17:49:20 +0100 > Auger Eric <eric.auger@xxxxxxxxxx> wrote: > >> Hi Alex, >> >> On 1/11/19 10:30 PM, Alex Williamson wrote: >>> On Tue, 8 Jan 2019 11:26:14 +0100 >>> Eric Auger <eric.auger@xxxxxxxxxx> wrote: >>> >>>> From: "Liu, Yi L" <yi.l.liu@xxxxxxxxxxxxxxx> >>>> >>>> In any virtualization use case, when the first translation stage >>>> is "owned" by the guest OS, the host IOMMU driver has no knowledge >>>> of caching structure updates unless the guest invalidation activities >>>> are trapped by the virtualizer and passed down to the host. >>>> >>>> Since the invalidation data are obtained from user space and will be >>>> written into physical IOMMU, we must allow security check at various >>>> layers. Therefore, generic invalidation data format are proposed here, >>>> model specific IOMMU drivers need to convert them into their own format. >>>> >>>> Signed-off-by: Liu, Yi L <yi.l.liu@xxxxxxxxxxxxxxx> >>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> >>>> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> >>>> Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx> >>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>>> >>>> --- >>>> v1 -> v2: >>>> - add arch_id field >>>> - renamed tlb_invalidate into cache_invalidate as this API allows >>>> to invalidate context caches on top of IOTLBs >>>> >>>> v1: >>>> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in >>>> header. Commit message reworded. >>>> --- >>>> drivers/iommu/iommu.c | 14 ++++++ >>>> include/linux/iommu.h | 14 ++++++ >>>> include/uapi/linux/iommu.h | 95 ++++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 123 insertions(+) >>>> >>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>> index 0f2b7f1fc7c8..b2e248770508 100644 >>>> --- a/drivers/iommu/iommu.c >>>> +++ b/drivers/iommu/iommu.c >>>> @@ -1403,6 +1403,20 @@ int iommu_set_pasid_table(struct iommu_domain *domain, >>>> } >>>> EXPORT_SYMBOL_GPL(iommu_set_pasid_table); >>>> >>>> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, >>>> + struct iommu_cache_invalidate_info *inv_info) >>>> +{ >>>> + int ret = 0; >>>> + >>>> + if (unlikely(!domain->ops->cache_invalidate)) >>>> + return -ENODEV; >>>> + >>>> + ret = domain->ops->cache_invalidate(domain, dev, inv_info); >>>> + >>>> + return ret; >>>> +} >>>> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate); >>>> + >>>> static void __iommu_detach_device(struct iommu_domain *domain, >>>> struct device *dev) >>>> { >>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >>>> index 1da2a2357ea4..96d59886f230 100644 >>>> --- a/include/linux/iommu.h >>>> +++ b/include/linux/iommu.h >>>> @@ -186,6 +186,7 @@ struct iommu_resv_region { >>>> * @of_xlate: add OF master IDs to iommu grouping >>>> * @pgsize_bitmap: bitmap of all possible supported page sizes >>>> * @set_pasid_table: set pasid table >>>> + * @cache_invalidate: invalidate translation caches >>>> */ >>>> struct iommu_ops { >>>> bool (*capable)(enum iommu_cap); >>>> @@ -231,6 +232,9 @@ struct iommu_ops { >>>> int (*set_pasid_table)(struct iommu_domain *domain, >>>> struct iommu_pasid_table_config *cfg); >>>> >>>> + int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev, >>>> + struct iommu_cache_invalidate_info *inv_info); >>>> + >>>> unsigned long pgsize_bitmap; >>>> }; >>>> >>>> @@ -294,6 +298,9 @@ extern void iommu_detach_device(struct iommu_domain *domain, >>>> struct device *dev); >>>> extern int iommu_set_pasid_table(struct iommu_domain *domain, >>>> struct iommu_pasid_table_config *cfg); >>>> +extern int iommu_cache_invalidate(struct iommu_domain *domain, >>>> + struct device *dev, >>>> + struct iommu_cache_invalidate_info *inv_info); >>>> extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); >>>> extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); >>>> extern int iommu_map(struct iommu_domain *domain, unsigned long iova, >>>> @@ -709,6 +716,13 @@ int iommu_set_pasid_table(struct iommu_domain *domain, >>>> { >>>> return -ENODEV; >>>> } >>>> +static inline int >>>> +iommu_cache_invalidate(struct iommu_domain *domain, >>>> + struct device *dev, >>>> + struct iommu_cache_invalidate_info *inv_info) >>>> +{ >>>> + return -ENODEV; >>>> +} >>>> >>>> #endif /* CONFIG_IOMMU_API */ >>>> >>>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h >>>> index 7a7cf7a3de7c..4605f5cfac84 100644 >>>> --- a/include/uapi/linux/iommu.h >>>> +++ b/include/uapi/linux/iommu.h >>>> @@ -47,4 +47,99 @@ struct iommu_pasid_table_config { >>>> }; >>>> }; >>>> >>>> +/** >>>> + * enum iommu_inv_granularity - Generic invalidation granularity >>>> + * @IOMMU_INV_GRANU_DOMAIN_ALL_PASID: TLB entries or PASID caches of all >>>> + * PASIDs associated with a domain ID >>>> + * @IOMMU_INV_GRANU_PASID_SEL: TLB entries or PASID cache associated >>>> + * with a PASID and a domain >>>> + * @IOMMU_INV_GRANU_PAGE_PASID: TLB entries of selected page range >>>> + * within a PASID >>>> + * >>>> + * When an invalidation request is passed down to IOMMU to flush translation >>>> + * caches, it may carry different granularity levels, which can be specific >>>> + * to certain types of translation caches. >>>> + * This enum is a collection of granularities for all types of translation >>>> + * caches. The idea is to make it easy for IOMMU model specific driver to >>>> + * convert from generic to model specific value. Each IOMMU driver >>>> + * can enforce check based on its own conversion table. The conversion is >>>> + * based on 2D look-up with inputs as follows: >>>> + * - translation cache types >>>> + * - granularity >>>> + * >>>> + * type | DTLB | TLB | PASID | >>>> + * granule | | | cache | >>>> + * -----------------+-----------+-----------+-----------+ >>>> + * DN_ALL_PASID | Y | Y | Y | >>>> + * PASID_SEL | Y | Y | Y | >>>> + * PAGE_PASID | Y | Y | N/A | >>>> + * >>>> + */ >>>> +enum iommu_inv_granularity { >>>> + IOMMU_INV_GRANU_DOMAIN_ALL_PASID, >>>> + IOMMU_INV_GRANU_PASID_SEL, >>>> + IOMMU_INV_GRANU_PAGE_PASID, >>>> + IOMMU_INV_NR_GRANU, >>>> +}; >>>> + >>>> +/** >>>> + * enum iommu_inv_type - Generic translation cache types for invalidation >>>> + * >>>> + * @IOMMU_INV_TYPE_DTLB: device IOTLB >>>> + * @IOMMU_INV_TYPE_TLB: IOMMU paging structure cache >>>> + * @IOMMU_INV_TYPE_PASID: PASID cache >>>> + * Invalidation requests sent to IOMMU for a given device need to indicate >>>> + * which type of translation cache to be operated on. Combined with enum >>>> + * iommu_inv_granularity, model specific driver can do a simple lookup to >>>> + * convert from generic to model specific value. >>>> + */ >>>> +enum iommu_inv_type { >>>> + IOMMU_INV_TYPE_DTLB, >>>> + IOMMU_INV_TYPE_TLB, >>>> + IOMMU_INV_TYPE_PASID, >>>> + IOMMU_INV_NR_TYPE >>>> +}; >>>> + >>>> +/** >>>> + * Translation cache invalidation header that contains mandatory meta data. >>>> + * @version: info format version, expecting future extesions >>>> + * @type: type of translation cache to be invalidated >>>> + */ >>>> +struct iommu_cache_invalidate_hdr { >>>> + __u32 version; >>>> +#define TLB_INV_HDR_VERSION_1 1 >>>> + enum iommu_inv_type type; >>>> +}; >>>> + >>>> +/** >>>> + * Translation cache invalidation information, contains generic IOMMU >>>> + * data which can be parsed based on model ID by model specific drivers. >>>> + * Since the invalidation of second level page tables are included in the >>>> + * unmap operation, this info is only applicable to the first level >>>> + * translation caches, i.e. DMA request with PASID. >>>> + * >>>> + * @granularity: requested invalidation granularity, type dependent >>>> + * @size: 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. >>> >>> Why is this a 4K page centric interface? >> This matches the vt-d Address Mask (AM) field of the IOTLB Invalidate >> Descriptor. We can pass a log2size instead. > > Could some options not require a power of two size? > >>>> + * @nr_pages: number of pages to invalidate >>>> + * @pasid: processor address space ID value per PCI spec. >>>> + * @arch_id: architecture dependent id characterizing a context >>>> + * and tagging the caches, ie. domain Identfier on VTD, >>>> + * asid on ARM SMMU >>>> + * @addr: page address to be invalidated >>>> + * @flags IOMMU_INVALIDATE_ADDR_LEAF: leaf paging entries >>>> + * IOMMU_INVALIDATE_GLOBAL_PAGE: global pages >>> >>> Shouldn't some of these be tied the the granularity of the >>> invalidation? It seems like this should be more similar to >>> iommu_pasid_table_config where the granularity of the invalidation >>> defines which entry within a union at the end of the structure is valid >>> and populated. Otherwise we have fields that don't make sense for >>> certain invalidations. >> >> I am a little bit embarrassed here as this API version is the outcome of >> long discussions held by Jacob, jean-Philippe and many others. I don't >> want to hijack that work as I am "simply" reusing this API. Nevertheless >> I am willing to help on this. So following your recommendation above I >> dare to propose an updated API: >> >> struct iommu_device_iotlb_inv_info { >> __u32 version; >> #define IOMMU_DEV_IOTLB_INV_GLOBAL 0 >> #define IOMMU_DEV_IOTLB_INV_SOURCEID (1 << 0) >> #define IOMMU_DEV_IOTLB_INV_PASID (1 << 1) >> __u8 granularity; >> __u64 addr; >> __u8 log2size; >> __u64 sourceid; >> __u64 pasid; >> __u8 padding[2]; >> }; >> >> struct iommu_iotlb_inv_info { >> __u32 version; >> #define IOMMU_IOTLB_INV_GLOBAL 0 >> #define IOMMU_IOTLB_INV_ARCHID (1 << 0) >> #define IOMMU_IOTLB_INV_PASID (1 << 1) >> #define IOMMU_IOTLB_INV_PAGE (1 << 2) >> __u8 granularity; >> __u64 archid; >> __u64 pasid; >> __u64 addr; >> __u8 log2size; >> __u8 padding[2]; >> }; >> >> struct iommu_pasid_inv_info { >> __u32 version; >> #define IOMMU_PASID_INV_GLOBAL 0 >> #define IOMMU_PASID_INV_ARCHID (1 << 0) >> #define IOMMU_PASID_INV_PASID (1 << 1) >> #define IOMMU_PASID_INV_SOURCEID (1 << 2) >> __u8 granularity; >> __u64 archid; >> __u64 pasid; >> __u64 sourceid; >> __u8 padding[3]; >> }; >> /** >> * Translation cache invalidation information, contains generic IOMMU >> * data which can be parsed based on model ID by model specific drivers. >> * Since the invalidation of second level page tables are included in >> * the unmap operation, this info is only applicable to the first level >> * translation caches, i.e. DMA request with PASID. >> * >> */ >> struct iommu_cache_invalidate_info { >> #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 >> __u32 version; >> #define IOMMU_INV_TYPE_IOTLB 1 /* IOMMU paging structure cache */ >> #define IOMMU_INV_TYPE_DEV_IOTLB 2 /* Device IOTLB */ >> #define IOMMU_INV_TYPE_PASID 3 /* PASID cache */ >> __u8 type; >> union { >> struct iommu_iotlb_invalidate_info iotlb_inv_info; >> struct iommu_dev_iotlb_invalidate_info dev_iotlb_inv_info; >> struct iommu_pasid_inv_info pasid_inv_info; >> }; >> }; >> >> At the moment I ignore the leaf bool parameter used on ARM for PASID >> invalidation and TLB invalidation. Maybe we can just invalidate more >> that the leaf cache structures at the moment? >> >> on ARM the PASID table can be invalidated per streamid. On vt-d, as far >> as I understand the sourceid does not tag the entries. >> >>> >>>> + * >>>> + */ >>>> +struct iommu_cache_invalidate_info { >>>> + struct iommu_cache_invalidate_hdr hdr; >>>> + enum iommu_inv_granularity granularity; >>> >>> A separate structure for hdr seems a little pointless. >> removed >>> >>>> + __u32 flags; >>>> +#define IOMMU_INVALIDATE_ADDR_LEAF (1 << 0) >>>> +#define IOMMU_INVALIDATE_GLOBAL_PAGE (1 << 1) >>>> + __u8 size; >>> >>> Really need some padding or packing here for any hope of having >>> consistency with userspace. >>> >>>> + __u64 nr_pages; >>>> + __u32 pasid; >>> >>> Sub-optimal ordering for packing/padding. Thanks, >> I introduced some padding above. Is that OK? > > No, you're not taking field alignment into account, processors don't > like unaligned data. If we have: > > struct foo { > uint32_t a; > uint8_t b; > uint64_t c; > uint8_t d; > uint64_t e; > }; > > In memory on a 64 bit system, that would look like: > > aaaab...ccccccccd.......eeeeeeee > > While on a 32 bit system, it would look like: > > aaaab...ccccccccd...eeeeeeee > > In this example we have 22 bytes of data (4 + 1 + 8 + 1 + 8), but the > structure is 32 bytes when provided by a 64 bit userspace or 28 bytes > when provided by a 32 bit userspace and the start address of the 'e' > field changes. A 64 bit kernel would process the latter structure > incorrectly or fault trying to copy the expected length from userspace. > Adding padding to the end doesn't solve this. If we instead reconstruct > the structure as: > > struct foo { > uint32_t a; > uint8_t b; > uint8_t d; > uint8_t pad[2]; > uint64_t c; > uint64_t e; > }; > > Then we create a structure that looks the same from either a 32 bit or > 64 bit userspace, is only 24 bytes in memory, and works for any > reasonable compiler, though we might choose to add a packed attribute to > make sure the compiler doesn't do anything screwy if we were paranoid. > Thanks, Thank you very much for your time and explanations. That's understood now. Eric > > Alex >