On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote: > On 2024-11-04 12:41 pm, Jason Gunthorpe wrote: > > On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote: > > > > +/** > > > > + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information > > > > + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) > > > > + * > > > > + * @flags: Must be set to 0 > > > > + * @__reserved: Must be 0 > > > > + * @idr: Implemented features for ARM SMMU Non-secure programming interface > > > > + * @iidr: Information about the implementation and implementer of ARM SMMU, > > > > + * and architecture version supported > > > > + * @aidr: ARM SMMU architecture version > > > > + * > > > > + * For the details of @idr, @iidr and @aidr, please refer to the chapters > > > > + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. > > > > + * > > > > + * User space should read the underlying ARM SMMUv3 hardware information for > > > > + * the list of supported features. > > > > + * > > > > + * Note that these values reflect the raw HW capability, without any insight if > > > > + * any required kernel driver support is present. Bits may be set indicating the > > > > + * HW has functionality that is lacking kernel software support, such as BTM. If > > > > + * a VMM is using this information to construct emulated copies of these > > > > + * registers it should only forward bits that it knows it can support. > > But how *is* a VMM supposed to know what it can support? I answered a related question to Mostafa with an example: https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@xxxxxxxxxx/ "global" capabilities that are enabled directly from the CD entry would follow the pattern. > Are they all expected to grovel the host devicetree/ACPI tables and > maintain their own knowledge of implementation errata to understand > what's actually usable? No, VMMs are expected to only implement base line features we have working today and not blindly add new features based only HW registers reported here. Each future capability we want to enable at the VMM needs an analysis: 1) Does it require kernel SW changes, ie like BTM? Then it needs a kernel_capabilities bit to say the kernel SW exists 2) Does it require data from ACPI/DT/etc? Then it needs a kernel_capabilities bit 3) Does it need to be "turned on" per VM, ie with a VMS enablement? Then it needs a new request flag in ALLOC_VIOMMU 4) Otherwise it can be read directly from the idr[] array This is why the comment above is so stern that the VMM "should only forward bits that it knows it can support". > S2 tables it its own business. AFAICS, unless the VMM wants to do some > fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing > a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work. If the VMM wrongly indicates HTTU support to the VM, because it wrongly inspected those bits in the idr report, then it is just broken. > I would say it does. Advertising a feature when we already know it's not > usable at all puts a non-trivial and unnecessary burden on the VMM and VM to > then have to somehow derive that information from other sources, at the risk > of being confused by unexpected behaviour if they don't. That is not the purpose here, the register report is not to be used as "advertising features". It describes details of the raw HW that the VMM may need to use *some* of the fields. There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML, CD2L, etc. Basically we will pass most of the bits and mask a few. If we get the masking wrong and pass something we shouldn't, then we've improved nothing compared to this proposal. I think we are likely to get the masking wrong :) > We sanitise CPU ID registers for userspace and KVM, so I see no compelling > reason for SMMU ID registers to be different. We discussed this already: https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@xxxxxxxxxx It is a false comparison, for KVM the kernel is responsible to control the CPU ID registers. Reporting the registers the VM sees to the VMM makes alot of sense. For SMMU the VMM exclusively controls the VM's ID registers. If you still feel strongly about this please let me know by Friday and I will drop the idr[] array from this cycle. We can continue to discuss a solution for the next cycle. Regards, Jason