On Mon, 21 Jun 2021 16:09:32 +0100 Steven Price <steven.price@xxxxxxx> wrote: > On 21/06/2021 14:39, Boris Brezillon wrote: > > If we don't do that, we have to wait for the job timeout to expire > > before the fault jobs gets killed. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> > > Don't we need to do something here to allow recovery of the MMU context > in the future? panfrost_mmu_disable() will zero out the MMU registers on > the hardware, but AFAICS panfrost_mmu_enable() won't be called to > restore the values until something evicts the address space (GPU power > down/reset or just too many other processes). > > The ideal would be to block submission of new jobs from this context and > then wait until existing jobs have completed at which point the MMU > state can be restored and jobs allowed again. Uh, I assumed it'd be okay to have subsequent jobs coming from this context to fail with a BUS_FAULT until the context is closed. But what you suggest seems more robust. > > But at a minimum I think we should have something like an 'MMU poisoned' > bit that panfrost_mmu_as_get() can check. > > Steve > > > --- > > drivers/gpu/drm/panfrost/panfrost_mmu.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > index 2a9bf30edc9d..d5c624e776f1 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > @@ -661,7 +661,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > > if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 0xC0) > > ret = panfrost_mmu_map_fault_addr(pfdev, as, addr); > > > > - if (ret) > > + if (ret) { > > /* terminal fault, print info about the fault */ > > dev_err(pfdev->dev, > > "Unhandled Page fault in AS%d at VA 0x%016llX\n" > > @@ -679,6 +679,10 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > > access_type, access_type_name(pfdev, fault_status), > > source_id); > > > > + /* Disable the MMU to stop jobs on this AS immediately */ > > + panfrost_mmu_disable(pfdev, as); > > + } > > + > > status &= ~mask; > > > > /* If we received new MMU interrupts, process them before returning. */ > > >