> From: Tian, Kevin <kevin.tian@xxxxxxxxx> > Sent: Tuesday, June 16, 2020 9:56 AM > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > Sent: Monday, June 15, 2020 2:05 PM > > > > Hi Kevin, > > > > > From: Tian, Kevin <kevin.tian@xxxxxxxxx> > > > Sent: Monday, June 15, 2020 9:23 AM > > > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > > > Sent: Friday, June 12, 2020 5:05 PM > > > > > > > > Hi Alex, > > > > > > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > > Sent: Friday, June 12, 2020 3:30 AM > > > > > > > > > > On Thu, 11 Jun 2020 05:15:21 -0700 > > > > > Liu Yi L <yi.l.liu@xxxxxxxxx> wrote: > > > > > > > > > > > IOMMUs that support nesting translation needs report the > > > > > > capability info to userspace, e.g. the format of first level/stage paging > > > structures. > > > > > > > > > > > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > > > > > > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > > > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > > > > > > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > > > > > > Cc: Joerg Roedel <joro@xxxxxxxxxx> > > > > > > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > > > > > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > > > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > > --- > > > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no > > > > > > actual user of it. right? So I'm wondering if we can reuse > > > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your > > > > opinions? > > > > > > > > > > > > include/linux/iommu.h | 1 + > > > > > > include/uapi/linux/iommu.h | 34 > > > > ++++++++++++++++++++++++++++++++++ > > > > > > 2 files changed, 35 insertions(+) > > > > > > > > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index > > > > > > 78a26ae..f6e4b49 100644 > > > > > > --- a/include/linux/iommu.h > > > > > > +++ b/include/linux/iommu.h > > > > > > @@ -126,6 +126,7 @@ enum iommu_attr { > > > > > > DOMAIN_ATTR_FSL_PAMUV1, > > > > > > DOMAIN_ATTR_NESTING, /* two stages of translation */ > > > > > > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, > > > > > > + DOMAIN_ATTR_NESTING_INFO, > > > > > > DOMAIN_ATTR_MAX, > > > > > > }; > > > > > > > > > > > > diff --git a/include/uapi/linux/iommu.h > > > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644 > > > > > > --- a/include/uapi/linux/iommu.h > > > > > > +++ b/include/uapi/linux/iommu.h > > > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data { > > > > > > }; > > > > > > }; > > > > > > > > > > > > +struct iommu_nesting_info { > > > > > > + __u32 size; > > > > > > + __u32 format; > > > > > > + __u32 features; > > > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > > > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > > > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << > > 2) > > > > > > + __u32 flags; > > > > > > + __u8 data[]; > > > > > > +}; > > > > > > + > > > > > > +/* > > > > > > + * @flags: VT-d specific flags. Currently reserved for future > > > > > > + * extension. > > > > > > + * @addr_width: The output addr width of first level/stage > > > translation > > > > > > + * @pasid_bits: Maximum supported PASID bits, 0 represents > > no > > > > PASID > > > > > > + * support. > > > > > > + * @cap_reg: Describe basic capabilities as defined in VT-d > > > > capability > > > > > > + * register. > > > > > > + * @cap_mask: Mark valid capability bits in @cap_reg. > > > > > > + * @ecap_reg: Describe the extended capabilities as defined in > VT-d > > > > > > + * extended capability register. > > > > > > + * @ecap_mask: Mark the valid capability bits in @ecap_reg. > > > > > > > > > > Please explain this a little further, why do we need to tell > > > > > userspace about cap/ecap register bits that aren't valid through this > > interface? > > > > > Thanks, > > > > > > > > we only want to tell userspace about the bits marked in the > > cap/ecap_mask. > > > > cap/ecap_mask is kind of white-list of the cap/ecap register. > > > > userspace should only care about the bits in the white-list, for other > > > > bits, it should ignore. > > > > > > > > Regards, > > > > Yi Liu > > > > > > For invalid bits if kernel just clears them then do we still need additional > > mask bits > > > to explicitly mark them out? I guess this might be the point that Alex asked... > > > > For invalid bits, kernel will clear them. But I think the mask bits is > > still necessary. The mask bits tells user space the bits related to > > nesting. Without it, user space may have no idea about it. > > userspace should know which bit is related to nesting and then should > check that bit explicitly... ok, so userspace could get such info by the understanding of spec, right? if user space could get it, then I think it's uncessary to have cap/ecap mask bits. > > > > Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU > > vIOMMU > > decides cap/ecap bits according to QEMU cmdline. But not all of them are > > compatible with hardware support. Especially, vIOMMU built on nesting. > > So needs to sync the cap/ecap bits with host side. Based on the mask > > bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with > > the cap/ecap bits reported by this interface. This comparation is limited > > to the nesting related bits in cap/ecap, the other bits are not included > > and can use the configuration by QEMU cmdline. > > I didn't get this explanation. Based on patch [15/15], nesting capabilities > are defined as: > +/* Nesting Support Capability Alignment */ > +#define VTD_CAP_FL1GP (1ULL << 56) > +#define VTD_CAP_FL5LP (1ULL << 60) > +#define VTD_ECAP_PRS (1ULL << 29) > +#define VTD_ECAP_ERS (1ULL << 30) > +#define VTD_ECAP_SRS (1ULL << 31) > +#define VTD_ECAP_EAFS (1ULL << 34) > +#define VTD_ECAP_PASID (1ULL << 40) > > When Qemu gets an cmdline option it knows which bit out of above > list should be checked against hardware capability. Then just do the > check bit-by-bit. Why do we need mask bit in uapi to tell which bits > are valid? as above reply, if userspace has the check list for the cap/ecap bits, then it's not necessary to use mask bit. > Unless 0/1 doesn't represent validity of some bit. Do we > have such example? yes, like the pasid bits. it's 20 bits. but we already got pasid_bits in the iommu_nesting_info_vtd structure. so it's not covered in the ecap_bits. Regards, Yi Liu > > > > The link below show the current Intel vIOMMU usage on the cap/ecap bits. > > For each assigned device, vIOMMU will compare the nesting related bits in > > cap/ecap and mask out the bits which hardware doesn't support. After the > > machine is intilized, the vIOMMU cap/ecap bits are determined. If user > > hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits > > behind hot-plug device are not compatible with determined vIOMMU > > cap/ecap > > bits. > > > > https://www.spinics.net/lists/kvm/msg218294.html > > > > Regards, > > Yi Liu > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > + */ > > > > > > +struct iommu_nesting_info_vtd { > > > > > > + __u32 flags; > > > > > > + __u16 addr_width; > > > > > > + __u16 pasid_bits; > > > > > > + __u64 cap_reg; > > > > > > + __u64 cap_mask; > > > > > > + __u64 ecap_reg; > > > > > > + __u64 ecap_mask; > > > > > > +}; > > > > > > + > > > > > > #endif /* _UAPI_IOMMU_H */