Re: [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

Robin Murphy <robin.murphy@xxxxxxx> · Thu, 15 Feb 2018 17:14:45 +0000

On 15/02/18 04:17, Tomasz Figa wrote:
[...]
Could you elaborate on what kind of locking you are concerned about?
As I explained before, the normally happening fast path would lock
dev->power_lock only for the brief moment of incrementing the runtime
PM usage counter.

My bad, that's not even it.

The atomic usage counter is incremented beforehands, without any
locking [1] and the spinlock is acquired only for the sake of
validating that device's runtime PM state remained valid indeed [2],
which would be the case in the fast path of the same driver doing two
mappings in parallel, with the master powered on (and so the SMMU,
through device links; if master was not powered on already, powering
on the SMMU is unavoidable anyway and it would add much more latency
than the spinlock itself).

We now have no locking at all in the map path, and only a per-domain 
lock around TLB sync in unmap which is unfortunately necessary for 
correctness; the latter isn't too terrible, since in "serious" hardware 
it should only be serialising a few cpus serving the same device against 
each other (e.g. for multiple queues on a single NIC).

Putting in a global lock which serialises *all* concurrent map and unmap 
calls for *all* unrelated devices makes things worse. Period. Even if 
the lock itself were held for the minimum possible time, i.e. trivially 
"spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing 
that one cache line around between 96 CPUs across two sockets is not 
negligible.

[1] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028
[2] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613

In any case, I can't imagine this working with V4L2 or anything else
relying on any memory management more generic than calling IOMMU API
directly from the driver, with the IOMMU device having runtime PM
enabled, but without managing the runtime PM from the IOMMU driver's
callbacks that need access to the hardware. As I mentioned before,
only the IOMMU driver knows when exactly the real hardware access
needs to be done (e.g. Rockchip/Exynos don't need to do that for
map/unmap if the power is down, but some implementations of SMMU with
TLB powered separately might need to do so).

It's worth noting that Exynos and Rockchip are relatively small 
self-contained IP blocks integrated closely with the interfaces of their 
relevant master devices; SMMU is an architecture, implementations of 
which may be large, distributed, and have complex and wildly differing 
internal topologies. As such, it's a lot harder to make 
hardware-specific assumptions and/or be correct for all possible cases.

Don't get me wrong, I do ultimately agree that the IOMMU driver is the 
only agent who ultimately knows what calls are going to be necessary for 
whatever operation it's performing on its own hardware*; it's just that 
for SMMU it needs to be implemented in a way that has zero impact on the 
cases where it doesn't matter, because it's not viable to specialise 
that driver for any particular IP implementation/use-case.

Robin.

*AFAICS it still makes some sense to have the get_suppliers option as 
well, though - the IOMMU driver does what it needs for correctness 
internally, but the external consumer doing something non-standard can 
can grab and hold the link around multiple calls to short-circuit that.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html