Re: [PATCH] drm/sched: Add native dependency support to drm_sched

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Mon, 12 Jun 2023 18:25:30 +0200

On Mon, 12 Jun 2023 16:59:02 +0200
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> Hi Christian,
> 
> On Mon, 12 Jun 2023 15:16:03 +0200
> Christian König <christian.koenig@xxxxxxx> wrote:
> 
> > Am 08.06.23 um 15:23 schrieb Donald Robson:  
> > > This patch adds support for 'native' dependencies to DRM scheduler.  In
> > > drivers that use a firmware based scheduler there are performance gains
> > > to be had by allowing waits to happen in the firmware, as this reduces
> > > the latency between signalling and job submission.    
> > 
> > Well, if I'm not completely mistaken this patch here is superfluous 
> > since we already use that functionality.
> > 
> > This strongly sounds like the HW dependencies we have in amdgpu. See 
> > AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES.  
> 
> I'll look at it in more details. Thanks for the pointer.

I had a quick look, and it looks pretty similar, indeed.

> 
> > 
> > Basically userspace can instead of giving a hard dependency to finish 
> > something before the current submission starts also give a soft 
> > dependency and only let the other submission be scheduled.
> > 
> > This way you can then wait for the firmware for certain operations of 
> > the previous submission to complete by comparing memory or registers.
> > 
> > You don't necessarily need to give control over this to userspace, if 
> > your kernel driver can determine a fw assisted wait by itself that 
> > should also work fine.  
> 
> That's what we did initially. We had a separate 'native_deps' xarray in
> pvr_job that we were managing ourselves, and that worked fine, except
> for the kill_entity() stuff. If you don't wait for those
> 'native-fences', you're potentially signaling the job finished fence
> earlier than it should.

Hm, I think we could get drm_sched_entity_kill_jobs_cb() to do the
right thing here without teaching drm_sched about native deps. If we
turn back scheduled fences into finished fences in
drm_sched_entity_kill_jobs_cb(), this should work:

static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
                                          struct dma_fence_cb *cb)
{
        struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
                                                 finish_cb);
        unsigned long index;
        int r;

        dma_fence_put(f);

        /* Wait for all dependencies to avoid data corruptions */
        xa_for_each(&job->dependencies, index, f) {
                struct drm_sched_fence *s_fence = to_drm_sched_fence(f);

                /* Make sure we wait for the finished fence here, so we can
                 * guarantee that any job we depend on that is still accessing
                 * resources is done before we signal this job finished fence
                 * and unblock further accesses on these resources.
                 */
                if (s_fence && f == &s_fence->scheduled)
                        f = &s_fence->finished;

                xa_erase(&job->dependencies, index);
                r = dma_fence_add_callback(f, &job->finish_cb,
                                           drm_sched_entity_kill_jobs_cb);
                if (!r)
                        return;

                dma_fence_put(f);
        }

        INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work);
        schedule_work(&job->work);
}

Then, for native fences, we just have to add the scheduled fence to the deps
array, as you do (and as we did in our first version), and we should be good.