On Tue, Sep 10, 2019 at 8:01 AM Robin Murphy <robin.murphy@xxxxxxx> wrote: > > On 07/09/2019 18:50, Rob Clark wrote: > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > When games, browser, or anything using a lot of GPU buffers exits, there > > can be many hundreds or thousands of buffers to unmap and free. If the > > GPU is otherwise suspended, this can cause arm-smmu to resume/suspend > > for each buffer, resulting 5-10 seconds worth of reprogramming the > > context bank (arm_smmu_write_context_bank()/arm_smmu_write_s2cr()/etc). > > To the user it would appear that the system is locked up. > > > > A simple solution is to use pm_runtime_put_autosuspend() instead, so we > > don't immediately suspend the SMMU device. > > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > --- > > Note: I've tied the autosuspend enable/delay to the consumer device, > > based on the reasoning that if the consumer device benefits from using > > an autosuspend delay, then it's corresponding SMMU probably does too. > > Maybe that is overkill and we should just unconditionally enable > > autosuspend. > > I'm not sure there's really any reason to expect that a supplier's usage > model when doing things for itself bears any relation to that of its > consumer(s), so I'd certainly lean towards the "unconditional" argument > myself. Sounds good, I'll respin w/ unconditional autosuspend > Of course ideally we'd skip resuming altogether in the map/unmap paths > (since resume implies a full TLB reset anyway), but IIRC that approach > started to get messy in the context of the initial RPM patchset. I'm > planning to fiddle around a bit more to clean up the implementation of > the new iommu_flush_ops stuff, so I've made a note to myself to revisit > RPM to see if there's a sufficiently clean way to do better. In the > meantime, though, I don't have any real objection to using some > reasonable autosuspend delay on the principle that if we've been woken > up to map/unmap one page, there's a high likelihood that more will > follow in short order (and in the configuration slow-paths it won't have > much impact either way). It does sort of remind me about something I was chatting with Jordan the other day.. about how we could possibly skip the TLB inv for unmaps from non-current pagetables once we have per-context pagetables. The challenge is, since the GPU's command parser is the one switching pagetables, we don't have any race-free way to know which pagetables are current. But we do know which contexts have work queued up for the GPU, so we can know either that a given context definitely isn't current, or that it might be current. And in the "definitely not current" case we could skip TLB inv. BR, -R > > Robin. > > > drivers/iommu/arm-smmu.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > index c2733b447d9c..73a0dd53c8a3 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -289,7 +289,7 @@ static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) > > static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) > > { > > if (pm_runtime_enabled(smmu->dev)) > > - pm_runtime_put(smmu->dev); > > + pm_runtime_put_autosuspend(smmu->dev); > > } > > > > static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) > > @@ -1445,6 +1445,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) > > /* Looks ok, so add the device to the domain */ > > ret = arm_smmu_domain_add_master(smmu_domain, fwspec); > > > > +#ifdef CONFIG_PM > > + /* TODO maybe device_link_add() should do this for us? */ > > + if (dev->power.use_autosuspend) { > > + pm_runtime_set_autosuspend_delay(smmu->dev, > > + dev->power.autosuspend_delay); > > + pm_runtime_use_autosuspend(smmu->dev); > > + } > > +#endif > > + > > rpm_put: > > arm_smmu_rpm_put(smmu); > > return ret; > >