On Thu, 25 Apr 2024 10:28:49 +0100 Steven Price <steven.price@xxxxxxx> wrote: > On 25/04/2024 08:18, Boris Brezillon wrote: > > From: Antonino Maniscalco <antonino.maniscalco@xxxxxxxxxxxxx> > > > > If the kernel couldn't allocate memory because we reached the maximum > > number of chunks but no render passes are in flight > > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > > handling to the FW by returning a NULL chunk. The FW will then call > > the tiler OOM exception handler, which is supposed to implement > > incremental rendering (execute an intermediate fragment job to flush > > the pending primitives, release the tiler memory that was used to > > store those primitives, and start over from where it stopped). > > > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > > Signed-off-by: Antonino Maniscalco <antonino.maniscalco@xxxxxxxxxxxxx> > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> > > Reviewed-by: Steven Price <steven.price@xxxxxxx> > > Although I think the real issue here is that we haven't clearly defined > the return values from panthor_heap_grow - it's a bit weird to have two > different error codes for the same "try again later after incremental > rendering" result. But as a fix this seems most clear. Yeah, I actually considered returning -EBUSY for the 'max_chunks reached' situation, but then realized we would also want to trigger incremental rendering for actual mem allocation failures (when chunk_count < max_chunks) once the fail-able/non-blocking allocation logic is implemented, and for this kind of failure it makes more sense to return -ENOMEM, even though this implies checking against two values instead of one. I guess returning -ENOMEM instead of -EBUSY for the case where we have render passes in-flight wouldn't be too awkward, as this can be seen as the kernel refusing to allocate more memory. > > Steve > > > --- > > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > > index b3a51a6de523..6de8c0c702cb 100644 > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) > > pending_frag_count, &new_chunk_va); > > } > > > > - if (ret && ret != -EBUSY) { > > + /* If the kernel couldn't allocate memory because we reached the maximum > > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM > > + * otherwise), we want to let the FW try to reclaim memory by waiting > > + * for fragment jobs to land or by executing the tiler OOM exception > > + * handler, which is supposed to implement incremental rendering. > > + */ > > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > > drm_warn(&ptdev->base, "Failed to extend the tiler heap\n"); > > group->fatal_queues |= BIT(cs_id); > > sched_queue_delayed_work(sched, tick, 0); >