On 25/04/2024 08:18, Boris Brezillon wrote: > From: Antonino Maniscalco <antonino.maniscalco@xxxxxxxxxxxxx> > > If the kernel couldn't allocate memory because we reached the maximum > number of chunks but no render passes are in flight > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > handling to the FW by returning a NULL chunk. The FW will then call > the tiler OOM exception handler, which is supposed to implement > incremental rendering (execute an intermediate fragment job to flush > the pending primitives, release the tiler memory that was used to > store those primitives, and start over from where it stopped). > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > Signed-off-by: Antonino Maniscalco <antonino.maniscalco@xxxxxxxxxxxxx> > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> Reviewed-by: Steven Price <steven.price@xxxxxxx> Although I think the real issue here is that we haven't clearly defined the return values from panthor_heap_grow - it's a bit weird to have two different error codes for the same "try again later after incremental rendering" result. But as a fix this seems most clear. Steve > --- > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index b3a51a6de523..6de8c0c702cb 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) > pending_frag_count, &new_chunk_va); > } > > - if (ret && ret != -EBUSY) { > + /* If the kernel couldn't allocate memory because we reached the maximum > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM > + * otherwise), we want to let the FW try to reclaim memory by waiting > + * for fragment jobs to land or by executing the tiler OOM exception > + * handler, which is supposed to implement incremental rendering. > + */ > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > drm_warn(&ptdev->base, "Failed to extend the tiler heap\n"); > group->fatal_queues |= BIT(cs_id); > sched_queue_delayed_work(sched, tick, 0);