Re: RMRR device on non-Intel platform

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 25 Apr 2023 12:58:25 -0300

On Tue, Apr 25, 2023 at 03:48:11PM +0100, Robin Murphy wrote:
> On 2023-04-21 18:58, Jason Gunthorpe wrote:
> > On Fri, Apr 21, 2023 at 06:22:37PM +0100, Robin Murphy wrote:
> > 
> > > I think a slightly more considered and slightly less wrong version of that
> > > idea is to mark it as IOMMU_RESV_MSI, and special-case direct-mapping those
> > > on Arm (I believe it would technically be benign to do on x86 too, but might
> > > annoy people with its pointlessness). However...
> > 
> > I'd rather have a IOMMU_RESV_MSI_DIRECT and put the ARM special case
> > in ARM code..
> 
> Maybe, but it's still actually broken either way, because how do you get
> that type into the VM? Firmware can't encode that a particular RMR
> represents the special magic hack for IOMMUFD, so now the SMMU driver needs
> to somehow be aware when it's running in a VM offering nested translation
> and do some more magic to inject the appropriate region, and it's all
> just... no.

Er, I figured ARM had sorted this out somehow :(

Eric, do you know anything about this? Where did you setup the 1:1 map
in the VM in your series?

So you are saying, the basic problem statement, is that the ACPI table
that describes the ITS direct mapping in the VM is supposed to be
interpreted by the SMMU driver as "memory must be iommu mapped 1:1 at
all times and is possibly dangerous enough to block userspace access
to the device, like Intel does" ?

This isn't end of the world bad, it just means that VFIO will not work
in ARM guests under this interrupt model. Sad, and something to fix,
but we can still cover alot of ground..

Maybe a GICv5 can correct it..

> > Frankly, I think something whent wrong with the GICv4 design. A purely
> > virtualization focused GIC should not have continued to rely on the
> > hypervisor trapping of the MSI-X writes. The guest should have had a
> > real data value and a real physical ITS page.
> 
> ...I believe the remaining missing part is a UAPI for the VMM to ask the
> host kernel to configure a "physical" vLPI for a given device and EventID,
> at the point when its vITS emulation is handling the guest's configuration
> command. With that we would no longer have to rewrite the MSI payload
> either, so can avoid trapping the device's MSI-X capability at all, and the
> VM could actually have non-terrible interrupt performance.

Yes.. More broadly I think we'd need to allow the vGIC code to
understand that it has complete control over a SID, and like we are
talking about for SMMU a vSID mapping as well.

This would have to replace the eventfd based hookup we have now.

I really want to avoid opening this can of worms because it is
basically iommufd all over again just irq focused :(

> Except GCC says __builtin_ctzl(0) is undefined, so although I'd concur that
> the chances of nasal demons at the point of invoking __ffs() are
> realistically quite low, I don't fancy arguing that with the static checker
> brigade. So by the time we've appeased them with additional checks,
> initialisations, etc., we'd have basically the same overhead as running 0
> iterations of another for loop (the overwhelmingly common case anyway), but
> in more lines of code, with a more convoluted flow. All of which leads me to
> conclude that "number of times we walk a usually-empty list in a one-off
> slow path" is not in fact the most worthwhile thing to optimise for ;)

Heh, well fair enough, we do have a UBSAN that might trip on this. Lu
can correct it

Jason