On 2024-11-06 6:05 pm, Jason Gunthorpe wrote:
On Wed, Nov 06, 2024 at 04:37:53PM +0000, Robin Murphy wrote:
On 2024-11-04 12:41 pm, Jason Gunthorpe wrote:
On Mon, Nov 04, 2024 at 11:47:24AM +0000, Will Deacon wrote:
+/**
+ * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
+ * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ * @idr: Implemented features for ARM SMMU Non-secure programming interface
+ * @iidr: Information about the implementation and implementer of ARM SMMU,
+ * and architecture version supported
+ * @aidr: ARM SMMU architecture version
+ *
+ * For the details of @idr, @iidr and @aidr, please refer to the chapters
+ * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
+ *
+ * User space should read the underlying ARM SMMUv3 hardware information for
+ * the list of supported features.
+ *
+ * Note that these values reflect the raw HW capability, without any insight if
+ * any required kernel driver support is present. Bits may be set indicating the
+ * HW has functionality that is lacking kernel software support, such as BTM. If
+ * a VMM is using this information to construct emulated copies of these
+ * registers it should only forward bits that it knows it can support.
But how *is* a VMM supposed to know what it can support?
I answered a related question to Mostafa with an example:
https://lore.kernel.org/linux-iommu/20240903235532.GJ3773488@xxxxxxxxxx/
"global" capabilities that are enabled directly from the CD entry
would follow the pattern.
Are they all expected to grovel the host devicetree/ACPI tables and
maintain their own knowledge of implementation errata to understand
what's actually usable?
No, VMMs are expected to only implement base line features we have
working today and not blindly add new features based only HW registers
reported here.
Each future capability we want to enable at the VMM needs an analysis:
1) Does it require kernel SW changes, ie like BTM? Then it needs a
kernel_capabilities bit to say the kernel SW exists
2) Does it require data from ACPI/DT/etc? Then it needs a
kernel_capabilities bit
3) Does it need to be "turned on" per VM, ie with a VMS enablement?
Then it needs a new request flag in ALLOC_VIOMMU
4) Otherwise it can be read directly from the idr[] array
This is why the comment above is so stern that the VMM "should only
forward bits that it knows it can support".
So... you're saying this patch is in fact broken, or at least uselessly
incomplete, since VMMs aren't allowed to emulate a vSMMU at all without
first consulting some other interface which does not exist? Great.
S2 tables it its own business. AFAICS, unless the VMM wants to do some
fiddly CD shadowing, it's going to be kinda hard to prevent the SMMU seeing
a guest CD with CD.HA and/or CD.HD set if the guest expects S1 HTTU to work.
If the VMM wrongly indicates HTTU support to the VM, because it
wrongly inspected those bits in the idr report, then it is just
broken.
What do you mean? We could have a system right now where the hardware is
configured with SMMU_IDR0.HTTU=2, but it turned out that atomics were
broken in the interconnect so firmware sets the IORT "HTTU override"
field is set to 0. We know about that in the kernel, but all a VMM sees
is iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken"
to take the only information available at face value, assume HTTU is
available, and reflect that in a vSMMU interface, then what is the
correct thing to do, other than to not dare emulate a vSMMU at all, in
fear of a sternly worded comment?
I would say it does. Advertising a feature when we already know it's not
usable at all puts a non-trivial and unnecessary burden on the VMM and VM to
then have to somehow derive that information from other sources, at the risk
of being confused by unexpected behaviour if they don't.
That is not the purpose here, the register report is not to be used as
"advertising features". It describes details of the raw HW that the
VMM may need to use *some* of the fields.
There are quite a few fields that fit #4 today: OAS, VAX, GRAN, BBML,
CD2L, etc.
Basically we will pass most of the bits and mask a few. If we get the
masking wrong and pass something we shouldn't, then we've improved
nothing compared to this proposal. I think we are likely to get the
masking wrong :)
Seriously? A simple inverse of the feature detection the kernel driver
already does for its own needs, implemented once in the same place, is hard?
Compared to maintaining the exact same information within the driver but
in some new different form, and also maintaining it in the UAPI, and
having every VMM ever all do the same work to put the two together, and
always be up to date with the right UAPI, and never ever let any field
slip through as-is, especially not all the ones which were RES0 at time
of writing, enforced by a sternly worded comment? Why yes, of course I
can see how that's trivially easy and carries no risk whatsoever.
We sanitise CPU ID registers for userspace and KVM, so I see no compelling
reason for SMMU ID registers to be different.
We discussed this already:
https://lore.kernel.org/linux-iommu/20240904120103.GB3915968@xxxxxxxxxx
It is a false comparison, for KVM the kernel is responsible to control
the CPU ID registers. Reporting the registers the VM sees to the VMM
makes alot of sense. For SMMU the VMM exclusively controls the VM's ID
registers.
Pointing out that two things are different is a false comparison because
they are different, by virtue of your choice to make them different?
Please try making sense.
Your tautology still does not offer any reasoning against doing the
logical thing and following the same basic pattern: the kernel uses the
ID register mechanism itself to advertise the set of features it's
able/willing to support, by sanitising the values it offers to the VMM,
combining the notions of hardware and kernel support where the
distinction is irrelevant anyway. The VMM is then still free to take
those values and hide more features, or potentially add any that it is
capable of emulating without the kernel's help, and advertise that final
set to the VM. Obviously there are significant *implementation*
differences, most notably that the latter VMM->VM part doesn't need to
involve IOMMUFD at all since MMIO register emulation can stay entirely
in userspace, whereas for CPU system registers the final VM-visible
values need to be plugged back in to KVM for it to handle the traps.
We are all asking you to explain why you think doing the kernel->VMM
advertisement naturally and intuitively is somehow bad, and forcing VMMs
to instead rely on a more complex, fragile, and crucially non-existent
additional interface is better. You should take "We discussed this
already" as more of a clue to yourself than to me - if 4 different
people have all said the exact same thing in so many words, perhaps
there's something in it...
And in case I need to spell it out with less sarcasm, "we'll get masking
wrong in the kernel" only implies "we'll get kernel_capabilities wrong
in the kernel (and elsewhere)", so it's clearly not a useful argument to
keep repeating. Besides, as KVM + sysfs + MRS emulation shows, we're
pretty experienced at masking ID registers in the kernel. It's not hard
to do it right in a robust manner, where particularly with the nature of
SMMU features, the only real risk might be forgetting to expose
something new once we do actually support it.
If you still feel strongly about this please let me know by Friday and
I will drop the idr[] array from this cycle. We can continue to
discuss a solution for the next cycle.
It already can't work as-is, I don't see how making it even more broken
would help. IMO it doesn't seem like a good idea to be merging UAPI at
all while it's still clearly incomplete and by its own definition unusable.
Thanks,
Robin.