On 2022-09-08 01:43, Jason Gunthorpe wrote:
On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:
FWIW, we're now very close to being able to validate dev->iommu against
where the domain came from in core code, and so short-circuit ->attach_dev
entirely if they don't match.
I don't think this is a long term direction. We have systems now with
a number of SMMU blocks and we really are going to see a need that
they share the iommu_domains so we don't have unncessary overheads
from duplicated io page table memory.
So ultimately I'd expect to pass the iommu_domain to the driver and
the driver will decide if the page table memory it represents is
compatible or not. Restricting to only the same iommu instance isn't
good..
Who said IOMMU instance?
Ah, I completely misunderstood what 'dev->iommu' was referring too, OK
I see.
Again, not what I was suggesting. In fact the nature of iommu_attach_group()
already rules out bogus devices getting this far, so all a driver currently
has to worry about is compatibility of a device that it definitely probed
with a domain that it definitely allocated. Therefore, from a caller's point
of view, if attaching to an existing domain returns -EINVAL, try another
domain; multiple different existing domains can be tried, and may also
return -EINVAL for the same or different reasons; the final attempt is to
allocate a fresh domain and attach to that, which should always be nominally
valid and *never* return -EINVAL. If any attempt returns any other error,
bail out down the usual "this should have worked but something went wrong"
path. Even if any driver did have a nonsensical "nothing went wrong, I just
can't attach my device to any of my domains" case, I don't think it would
really need distinguishing from any other general error anyway.
The algorithm you described is exactly what this series does, it just
used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
fundamental problem, just a bit more work.
Looking at Nicolin's series there is a bunch of existing errnos that
would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
ENXIO are all returned as codes for 'domain incompatible with device'
in various drivers. So the patch would still look much the same, just
changing them to EINVAL instead of EMEDIUMTYPE.
That leaves the question of the remaining EINVAL's that Nicolin did
not convert to EMEDIUMTYPE.
eg in the AMD driver:
if (!check_device(dev))
return -EINVAL;
iommu = rlookup_amd_iommu(dev);
if (!iommu)
return -EINVAL;
These are all cases of 'something is really wrong with the device or
iommu, everything will fail'. Other drivers are using ENODEV for this
already, so we'd probably have an additional patch changing various
places like that to ENODEV.
This mixture of error codes is the basic reason why a new code was
used, because none of the existing codes are used with any
consistency.
But OK, I'm on board, lets use more common errnos with specific
meaning, that can be documented in a comment someplace:
ENOMEM - out of memory
ENODEV - no domain can attach, device or iommu is messed up
EINVAL - the domain is incompatible with the device
<others> - Same behavior as ENODEV, use is discouraged.
I think achieving consistency of error codes is a generally desirable
goal, it makes the error code actually useful.
Joerg this is a good bit of work, will you be OK with it?
Thus as long as we can maintain that basic guarantee that attaching
a group to a newly allocated domain can only ever fail for resource
allocation reasons and not some spurious "incompatibility", then we
don't need any obscure trickery, and a single, clear, error code is
in fact enough to say all that needs to be said.
As above, this is not the case, drivers do seem to have error paths
that are unconditional on the domain. Perhaps they are just protective
assertions and never happen.
Right, that's the gist of what I was getting at - I think it's worth
putting in the effort to audit and fix the drivers so that that *can* be
the case, then we can have a meaningful error API with standard codes
effectively for free, rather than just sighing at the existing mess and
building a slightly esoteric special case on top.
Case in point, the AMD checks quoted above are pointless, since it
checks the same things in ->probe_device, and if that fails then the
device won't get a group so there's no way for it to even reach
->attach_dev any more. I'm sure there's a *lot* of cruft that can be
cleared out now that per-device and per-domain ops give us this kind of
inherent robustness.
Cheers,
Robin.
Regardless, it doesn't matter. If they return ENODEV or EINVAL the
VFIO side algorithm will continue to work fine, it just does alot more
work if EINVAL is permanently returned.
Thanks,
Jason