> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Sent: Monday, June 15, 2020 2:05 PM > > Hi Kevin, > > > From: Tian, Kevin <kevin.tian@xxxxxxxxx> > > Sent: Monday, June 15, 2020 9:23 AM > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > > Sent: Friday, June 12, 2020 5:05 PM > > > > > > Hi Alex, > > > > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > Sent: Friday, June 12, 2020 3:30 AM > > > > > > > > On Thu, 11 Jun 2020 05:15:21 -0700 > > > > Liu Yi L <yi.l.liu@xxxxxxxxx> wrote: > > > > > > > > > IOMMUs that support nesting translation needs report the > > > > > capability info to userspace, e.g. the format of first level/stage paging > > structures. > > > > > > > > > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > > > > > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > > > > > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > > > > > Cc: Joerg Roedel <joro@xxxxxxxxxx> > > > > > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > > > > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > --- > > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no > > > > > actual user of it. right? So I'm wondering if we can reuse > > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your > > > opinions? > > > > > > > > > > include/linux/iommu.h | 1 + > > > > > include/uapi/linux/iommu.h | 34 > > > ++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 35 insertions(+) > > > > > > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index > > > > > 78a26ae..f6e4b49 100644 > > > > > --- a/include/linux/iommu.h > > > > > +++ b/include/linux/iommu.h > > > > > @@ -126,6 +126,7 @@ enum iommu_attr { > > > > > DOMAIN_ATTR_FSL_PAMUV1, > > > > > DOMAIN_ATTR_NESTING, /* two stages of translation */ > > > > > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, > > > > > + DOMAIN_ATTR_NESTING_INFO, > > > > > DOMAIN_ATTR_MAX, > > > > > }; > > > > > > > > > > diff --git a/include/uapi/linux/iommu.h > > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644 > > > > > --- a/include/uapi/linux/iommu.h > > > > > +++ b/include/uapi/linux/iommu.h > > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { > > > > > }; > > > > > }; > > > > > > > > > > +struct iommu_nesting_info { > > > > > + __u32 size; > > > > > + __u32 format; > > > > > + __u32 features; > > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << > 2) > > > > > + __u32 flags; > > > > > + __u8 data[]; > > > > > +}; > > > > > + > > > > > +/* > > > > > + * @flags: VT-d specific flags. Currently reserved for future > > > > > + * extension. > > > > > + * @addr_width: The output addr width of first level/stage > > translation > > > > > + * @pasid_bits: Maximum supported PASID bits, 0 represents > no > > > PASID > > > > > + * support. > > > > > + * @cap_reg: Describe basic capabilities as defined in VT-d > > > capability > > > > > + * register. > > > > > + * @cap_mask: Mark valid capability bits in @cap_reg. > > > > > + * @ecap_reg: Describe the extended capabilities as defined in VT-d > > > > > + * extended capability register. > > > > > + * @ecap_mask: Mark the valid capability bits in @ecap_reg. > > > > > > > > Please explain this a little further, why do we need to tell > > > > userspace about cap/ecap register bits that aren't valid through this > interface? > > > > Thanks, > > > > > > we only want to tell userspace about the bits marked in the > cap/ecap_mask. > > > cap/ecap_mask is kind of white-list of the cap/ecap register. > > > userspace should only care about the bits in the white-list, for other > > > bits, it should ignore. > > > > > > Regards, > > > Yi Liu > > > > For invalid bits if kernel just clears them then do we still need additional > mask bits > > to explicitly mark them out? I guess this might be the point that Alex asked... > > For invalid bits, kernel will clear them. But I think the mask bits is > still necessary. The mask bits tells user space the bits related to > nesting. Without it, user space may have no idea about it. userspace should know which bit is related to nesting and then should check that bit explicitly... > > Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU > vIOMMU > decides cap/ecap bits according to QEMU cmdline. But not all of them are > compatible with hardware support. Especially, vIOMMU built on nesting. > So needs to sync the cap/ecap bits with host side. Based on the mask > bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with > the cap/ecap bits reported by this interface. This comparation is limited > to the nesting related bits in cap/ecap, the other bits are not included > and can use the configuration by QEMU cmdline. I didn't get this explanation. Based on patch [15/15], nesting capabilities are defined as: +/* Nesting Support Capability Alignment */ +#define VTD_CAP_FL1GP (1ULL << 56) +#define VTD_CAP_FL5LP (1ULL << 60) +#define VTD_ECAP_PRS (1ULL << 29) +#define VTD_ECAP_ERS (1ULL << 30) +#define VTD_ECAP_SRS (1ULL << 31) +#define VTD_ECAP_EAFS (1ULL << 34) +#define VTD_ECAP_PASID (1ULL << 40) When Qemu gets an cmdline option it knows which bit out of above list should be checked against hardware capability. Then just do the check bit-by-bit. Why do we need mask bit in uapi to tell which bits are valid? Unless 0/1 doesn't represent validity of some bit. Do we have such example? > > The link below show the current Intel vIOMMU usage on the cap/ecap bits. > For each assigned device, vIOMMU will compare the nesting related bits in > cap/ecap and mask out the bits which hardware doesn't support. After the > machine is intilized, the vIOMMU cap/ecap bits are determined. If user > hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits > behind hot-plug device are not compatible with determined vIOMMU > cap/ecap > bits. > > https://www.spinics.net/lists/kvm/msg218294.html > > Regards, > Yi Liu > > > > > > > > Alex > > > > > > > > > > > > > + */ > > > > > +struct iommu_nesting_info_vtd { > > > > > + __u32 flags; > > > > > + __u16 addr_width; > > > > > + __u16 pasid_bits; > > > > > + __u64 cap_reg; > > > > > + __u64 cap_mask; > > > > > + __u64 ecap_reg; > > > > > + __u64 ecap_mask; > > > > > +}; > > > > > + > > > > > #endif /* _UAPI_IOMMU_H */