Re: [PATCH v2 05/12] drm/panfrost: Disable the AS on unhandled page faults

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Mon, 21 Jun 2021 17:32:54 +0200

On Mon, 21 Jun 2021 16:09:32 +0100
Steven Price <steven.price@xxxxxxx> wrote:

> On 21/06/2021 14:39, Boris Brezillon wrote:
> > If we don't do that, we have to wait for the job timeout to expire
> > before the fault jobs gets killed.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>  
> 
> Don't we need to do something here to allow recovery of the MMU context
> in the future? panfrost_mmu_disable() will zero out the MMU registers on
> the hardware, but AFAICS panfrost_mmu_enable() won't be called to
> restore the values until something evicts the address space (GPU power
> down/reset or just too many other processes).
> 
> The ideal would be to block submission of new jobs from this context and
> then wait until existing jobs have completed at which point the MMU
> state can be restored and jobs allowed again.

Uh, I assumed it'd be okay to have subsequent jobs coming from
this context to fail with a BUS_FAULT until the context is closed. But
what you suggest seems more robust.

> 
> But at a minimum I think we should have something like an 'MMU poisoned'
> bit that panfrost_mmu_as_get() can check.
> 
> Steve
> 
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_mmu.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > index 2a9bf30edc9d..d5c624e776f1 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > @@ -661,7 +661,7 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >  		if ((status & mask) == BIT(as) && (exception_type & 0xF8) == 0xC0)
> >  			ret = panfrost_mmu_map_fault_addr(pfdev, as, addr);
> >  
> > -		if (ret)
> > +		if (ret) {
> >  			/* terminal fault, print info about the fault */
> >  			dev_err(pfdev->dev,
> >  				"Unhandled Page fault in AS%d at VA 0x%016llX\n"
> > @@ -679,6 +679,10 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >  				access_type, access_type_name(pfdev, fault_status),
> >  				source_id);
> >  
> > +			/* Disable the MMU to stop jobs on this AS immediately */
> > +			panfrost_mmu_disable(pfdev, as);
> > +		}
> > +
> >  		status &= ~mask;
> >  
> >  		/* If we received new MMU interrupts, process them before returning. */
> >   
>