Re: [PATCH v4 05/12] iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 6 Nov 2024 22:35:06 -0400

On Wed, Nov 06, 2024 at 09:05:26PM +0000, Robin Murphy wrote:
> > Each future capability we want to enable at the VMM needs an analysis:
> > 
> >   1) Does it require kernel SW changes, ie like BTM? Then it needs a
> >      kernel_capabilities bit to say the kernel SW exists
> >   2) Does it require data from ACPI/DT/etc? Then it needs a
> >      kernel_capabilities bit
> >   3) Does it need to be "turned on" per VM, ie with a VMS enablement?
> >      Then it needs a new request flag in ALLOC_VIOMMU
> >   4) Otherwise it can be read directly from the idr[] array
> > 
> > This is why the comment above is so stern that the VMM "should only
> > forward bits that it knows it can support".
> 
> So... you're saying this patch is in fact broken, or at least uselessly
> incomplete, since VMMs aren't allowed to emulate a vSMMU at all without
> first consulting some other interface which does not exist? Great.

That is too far, there is nothing fundamentally broken here.

This does not enable every SMMU feature in the architecture. It
implements a basic baseline and no more. The VMMs are only permitted
to read bits in that baseline. VMMs that want to go beyond that have
to go through the 4 steps above.

Fundamentally all we are arguing about here is if bits the VMM
shouldn't even read should be forced to zero by the kernel.

If the bits are forced to zero then perhaps they could be used as a
future SW feature negotiation, and maybe that would be better, or
maybe not. See below about BTM for a counter example.

I agree the kdoc does not describe what the baseline actually is.

> We know about that in the kernel, but all a VMM sees is
> iommu_hw_info_arm_smmuv3.idr[0] indicating HTTU=2. If it is "broken" to take
> the only information available at face value, assume HTTU is available, and
> reflect that in a vSMMU interface, then what is the correct thing to do,
> other than to not dare emulate a vSMMU at all, in fear of a sternly worded
> comment?

These patches support a baseline feature set. They do not support the
entire SMMU architecture. They do not support vHTTU.

Things are going step by step because this is a huge project. HTTU is
not in the baseline, so it is definitely wrong for any VMM working on
these patches to advertise support to the VM! The VMM *MUST* ignore
such bits in the IDR report.

The correct thing is the 4 steps I outlined above. When a VMM author
wants to go beyond the baseline they have to determine what kernel and
VMM software is required and implement the missing parts.

> Your tautology still does not offer any reasoning against doing the logical
> thing and following the same basic pattern: the kernel uses the ID register
> mechanism itself to advertise the set of features it's able/willing to
> support, by sanitising the values it offers to the VMM, combining the
> notions of hardware and kernel support where the distinction is irrelevant
> anyway.

Even with your plan the VMM can not pass the kernel's IDR register
into the VM unprotected. It must always sanitize it against the
features that the VMM supports.

Let's consider vBTM support. This requires kernel support to enable
the global BTM register *AND* a new VMM ioctl flow with iommufd to
link to KVM's VMID.

You are suggesting that the old kernel should wire IDR BTM to 0 and
the new kernel will set it to 1. However the VMM cannot just forward
this bit to the VM! A old VMM that does not do the KVM flow does NOT
support BTM.

All VMM's must mask the BTM bit from day 0, or we can never set it to
1 in a future kernel. Which is exactly the same plan as this patch!

> You should take "We discussed this already"
> as more of a clue to yourself than to me - if 4 different people have all
> said the exact same thing in so many words, perhaps there's something in
> it...

And all seemed to agree it was not a big deal after the discussion.

I think Mostafa was driving in a direction that we break up the IDR
into explicit fields and thus be explicit about what information the
VMM is able to access. This would effectively document and enforce
what the baseline is.

> And in case I need to spell it out with less sarcasm, "we'll get masking
> wrong in the kernel" only implies "we'll get kernel_capabilities wrong in
> the kernel (and elsewhere)",

Less sarcasm and hyperbole would be nice.

> > If you still feel strongly about this please let me know by Friday and
> > I will drop the idr[] array from this cycle. We can continue to
> > discuss a solution for the next cycle.
> 
> It already can't work as-is, 

As above, this is not true.

> I don't see how making it even more broken
> would help. IMO it doesn't seem like a good idea to be merging UAPI at all
> while it's still clearly incomplete and by its own definition unusable.

I agree that removing the idr would make it definitely unusable. There
is quite a bit of essential information in there the VMM
needs. However, the uAPI is extensible so adding a new field in the
next cycle is not breaking.

I'm trying to offer you an olive branch so we can have this discussion
with time instead of urgently at the 11th hour and derailing the
series. This exact topic has been discussed already two months ago,
and is, IMHO, mostly a style choice not an urgent technical matter.

If you have another option we can conclude in the next few days then
I'm happy to hear it.

If we focus only on the baseline functionality Nicolin can come up
with a list of fields and we can write it very explicitly in the kdoc
that the VMM can only read those fields.

I could also reluctantly agree to permanent masking to report only the
above kdoc bits in the kernel. Noting if you want this because you
don't trust the VMM authors to follow an explicit kdoc, then we likely
also cannot ever set some bits to 1. Eg BTM would be permanently 0.

Regards,
Jason