Re: [RFC 1/3] iommu/arm-smmu: Add support to opt-in to stalling

Rob Clark <robdclark@xxxxxxxxx> · Fri, 6 Jan 2017 11:26:49 -0500

On Thu, Jan 5, 2017 at 10:49 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
> On Thu, Jan 05, 2017 at 10:27:27AM -0500, Rob Clark wrote:
>> On Thu, Jan 5, 2017 at 6:55 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
>> > On Tue, Jan 03, 2017 at 04:30:54PM -0500, Rob Clark wrote:
>> >> TODO maybe we want two options, one to enable stalling, and 2nd to punt
>> >> handling to wq?  I haven't needed to use mm APIs from fault handler yet
>> >> (although it is something that I think we'll want some day).  Perhaps
>> >> stalling support is limited to just letting driver dump some extra
>> >> debugging information otherwise.  Threaded handling probably only useful
>> >> with stalling, but inverse may not always be true.
>> >
>> > I'd actually like to see this stuck on a worker thread, because I think
>> > that's more generally useful and I don't want to have a situation where
>> > sometimes the IOMMU fault notifier is run in IRQ context and sometimes it's
>> > not.
>>
>> So I was talking a bit w/ Jordan on IRC yesterday..  and we also have
>> the GPU's hw hang-detect to contend with.  So I *suspect* that when we
>> get to the point of using this to do things like page in things from
>> swap and resume the faulting transaction, we probably want to get
>> called immediately from the IRQ handler so we can disable the hw
>> hang-detect.
>
> Well, if you want to use an SMMU for paging, then the GPU driver would
> need to request that explicitly when allocating its DMA buffers, to that
> would be the time to either delay or disable the hang detection.

If userspace is using SVM, for example, it is pretty impossible to
know when to expect a fault.  The best you could do is keep track that
*some* process which has active work queued up for gpu is using SVM
and disable hang detect for *everyone*.. which is kind of sad.

>> I'm not sure if the better solution then would be to have two fault
>> callbacks, one immediately from the IRQ and a later one from wq.  Or
>> let the driver handle the wq business and give it a way to tell the
>> IOMMU when to resume.
>>
>> I kinda think we should punt on the worker thread for now until we are
>> ready to resume faulting transactions, because I guess a strong chance
>> that whatever way we do it now will be wrong ;-)
>
> I guess what I'm after is for you to change the interrupt handlers to be
> threaded, like they are for SMMUv3. I *think* you can do that with a NULL
> thread_fn for now, and just call report_iommu_fault from the handler.
> The return value of that could, in theory, be used to queued the paging
> request and wake the paging thread in future.

If we only pass in the non-threaded irq fxn, I'm not really sure how
that changes anything.. or maybe I'm not understanding what you mean.

But yeah, I guess we could use request_threaded_irq() to get both IRQ
context notification and a later thread context notification rather
than doing the wq thing.  Either way the iommu API has to change
slightly.

>> > I wonder if this should also be predicated on the compatible string, so
>> > that the "arm,smmu-enable-stall" property is ignored (with a warning) if
>> > the compatible string isn't specific enough to identify an implementation
>> > with the required SS behaviour? On the other hand, it feels pretty
>> > redundant and a single "stalling works" property is all we need.
>>
>> We could also drop the property and key the behavior on specific
>> compat strings I guess.  Having both seems a bit odd.  Anyways, I'll
>> defer to DT folks about what the cleaner approach is.
>
> As Robin pointed out, we do need to be able to distinguish the integration
> of the device from the device itself. For example, MMU-9000 might be capable
> of stalling, but if it's bolted to a PCI RC, it's not safe to do so.

Hmm, well we install the fault handler on the iommu_domain..  perhaps
maybe a combo of dts property (or deciding based on more specific
compat string), plus extra param passed in to
iommu_set_fault_hander().  The dts property or compat string to
indicate whether the iommu (and how it is wired up) can handle stalls,
and enable_stall param when fault handler is registered to indicate
whether the device itself can cope.. if either can't do stalling, then
don't set CFCFG.

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html