On Thu, Feb 22, 2018 at 3:13 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@xxxxxxx> wrote: >>> On 15/02/18 04:17, Tomasz Figa wrote: >>> [...] >>>>> >>>>> Could you elaborate on what kind of locking you are concerned about? >>>>> As I explained before, the normally happening fast path would lock >>>>> dev->power_lock only for the brief moment of incrementing the runtime >>>>> PM usage counter. >>>> >>>> >>>> My bad, that's not even it. >>>> >>>> The atomic usage counter is incremented beforehands, without any >>>> locking [1] and the spinlock is acquired only for the sake of >>>> validating that device's runtime PM state remained valid indeed [2], >>>> which would be the case in the fast path of the same driver doing two >>>> mappings in parallel, with the master powered on (and so the SMMU, >>>> through device links; if master was not powered on already, powering >>>> on the SMMU is unavoidable anyway and it would add much more latency >>>> than the spinlock itself). >>> >>> >>> We now have no locking at all in the map path, and only a per-domain lock >>> around TLB sync in unmap which is unfortunately necessary for correctness; >>> the latter isn't too terrible, since in "serious" hardware it should only be >>> serialising a few cpus serving the same device against each other (e.g. for >>> multiple queues on a single NIC). >>> >>> Putting in a global lock which serialises *all* concurrent map and unmap >>> calls for *all* unrelated devices makes things worse. Period. Even if the >>> lock itself were held for the minimum possible time, i.e. trivially >>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that >>> one cache line around between 96 CPUs across two sockets is not negligible. >> >> Fair enough. Note that we're in a quite interesting situation now: >> a) We need to have runtime PM enabled on Qualcomm SoC to have power >> properly managed, >> b) We need to have lock-free map/unmap on such distributed systems, >> c) If runtime PM is enabled, we need to call into runtime PM from any >> code that does hardware accesses, otherwise the IOMMU API (and so DMA >> API and then any V4L2 driver) becomes unusable. >> >> I can see one more way that could potentially let us have all the >> three. How about enabling runtime PM only on selected implementations >> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded >> with if (pm_runtime_enabled()), which is lockless? >> > > Sorry for pinging, but any opinion on this kind of approach? > It is ok by me, for whatever that is worth BR, -R -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html