On 2022-02-23 05:01, Lu Baolu wrote:
On 2/23/22 7:53 AM, Jason Gunthorpe wrote:
To spell it out, the scheme I'm proposing looks like this:
Well, I already got this, it is what is in driver_or_DMA_API_token()
that matters
I think you are suggesting to do something like:
if (!READ_ONCE(dev->driver) || ???)
return NULL;
return group; // A DMA_API 'token'
Which is locklessly reading dev->driver, and why you are talking about
races, I guess.
I am afraid that we are not able to implement a race-free
driver_or_DMA_API_token() helper. The lock problem between the IOMMU
core and driver core always exists.
It's not race-free. My point is that the races aren't harmful because
what we might infer from the "wrong" information still leads to the
right action. dev->driver is obviously always valid and constant for
*claiming* ownership, since that either happens for the DMA API in the
middle of really_probe() binding driver to dev, or while driver is
actively using dev and calling iommu_group_claim_dma_owner(). The races
exist during remove, but both probe and remove are serialised on the
group mutex after respectively setting/clearing dev->driver, there are
only 4 possibilities for the state of any other group sibling "tmp"
during the time that dev holds that mutex in its remove path:
1 - tmp->driver is non-NULL because tmp is already bound.
1.a - If tmp->driver->driver_managed_dma == 0, the group must
currently be DMA-API-owned as a whole. Regardless of what driver dev has
unbound from, its removal does not release someone else's DMA API
(co-)ownership.
1.b - If tmp->driver->driver_managed_dma == 1 and tmp->driver ==
group->owner, then dev must have unbound from the same driver, but
either way that driver has not yet released ownership so dev's removal
does not change anything.
1.c - If tmp->driver->driver_managed_dma == 1 and tmp->driver !=
group->owner, it doesn't matter. Even if tmp->driver is currently
waiting to attempt to claim ownership it can't do so until we release
the mutex.
2 - tmp->driver is non-NULL because tmp is in the process of binding.
2.a - If tmp->driver->driver_managed_dma == 0, tmp can be assumed to
be waiting on the group mutex to claim DMA API ownership.
2.a.i - If the group is DMA API owned, this race is simply a
short-cut to case 1.a - dev's ownership is effectively handed off
directly to tmp, rather than potentially being released and immediately
reclaimed. Once tmp gets its turn, it finds the group already
DMA-API-owned as it wanted and all is well. This may be "unfair" if an
explicit claim was also waiting, but not incorrect.
2.a.ii - If the group is driver-owned, it doesn't matter. Removing
dev does not change the current ownership, and tmp's probe will
eventually get its turn and find whatever it finds at that point in future.
2.b - If tmp->driver->driver_managed_dma == 1, it doesn't matter.
Either that driver already owns the group, or it might try to claim it
after we've resolved dev's removal and released the mutex, in which case
it will find whatever it finds.
3 - tmp->driver is NULL because tmp is unbound. Obviously no impact.
4 - tmp->driver is NULL because tmp is in the process of unbinding.
4.a - If the group is DMA-API-owned, either way tmp has no further
influence.
4.a.i - If tmp has unbound from a driver_managed_dma=0 driver, it
must be waiting to release its DMA API ownership, thus if tmp would
otherwise be the only remaining DMA API owner, the race is that dev's
removal releases ownership on behalf of both devices. When tmp's own
removal subsequently gets the mutex, it will either see that the group
is already unowned, or maybe that someone else has re-claimed it in the
interim, and either way do nothing, which is fine.
4.a.ii - If tmp has unbound from a driver_managed_dma=1 driver, it
doesn't matter, as in case 1.c.
4.b - If the group is driver-owned, it doesn't matter. That ownership
can only change if that driver releases it, which isn't happening while
we hold the mutex.
As I said yesterday, I'm really just airing out an idea here; I might
write up some proper patches as part of the bus ops work, and we can
give it proper scrutiny then.
For example, when we implemented iommu_group_store_type() to change the
default domain type of a device through sysfs, we could only comprised
and limited this functionality to singleton groups to avoid the lock
issue.
Indeed, but once the probe and remove paths for grouped devices have to
serialise on the group mutex, as we're introducing here, the story
changes and we gain a lot more power. In fact that's a good point I
hadn't considered yet - that sysfs constraint is functionally equivalent
to the one in iommu_attach_device(), so once we land this ownership
concept we should be free to relax it from "singleton" to "unowned" in
much the same way as your other series is doing for attach.
Unfortunately, that compromise cannot simply applied to the problem to
be solved by this series, because the iommu core cannot abort the driver
binding when the conflict is detected in the bus notifier.
No, I've never proposed that probe-time DMA ownership can be claimed
from a notifier, we all know why that doesn't work. It's only the
dma_cleanup() step that *could* be punted back to iommu_bus_notifier vs.
the driver core having to know about it. Either way we're still
serialising remove/failure against probe/remove of other devices in a
group, and that's the critical aspect.
Thanks,
Robin.