Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

Danilo Krummrich <dakr@xxxxxxxxxx> · Tue, 12 Sep 2023 15:34:41 +0200

On Tue, Sep 12, 2023 at 03:27:05PM +0200, Boris Brezillon wrote:
> On Fri, 25 Aug 2023 15:45:49 +0200
> Christian König <christian.koenig@xxxxxxx> wrote:
> 
> > >>>> I tried this patch with Nouveau and found a race condition:
> > >>>>
> > >>>> In drm_sched_run_job_work() the job is added to the pending_list via
> > >>>> drm_sched_job_begin(), then the run_job() callback is called and the scheduled
> > >>>> fence is signaled.
> > >>>>
> > >>>> However, in parallel drm_sched_get_cleanup_job() might be called from
> > >>>> drm_sched_free_job_work(), which picks the first job from the pending_list and
> > >>>> for the next job on the pending_list sets the scheduled fence' timestamp field.  
> > >> Well why can this happen in parallel? Either the work items are scheduled to
> > >> a single threaded work queue or you have protected the pending list with
> > >> some locks.
> > >>  
> > > Xe uses a single-threaded work queue, Nouveau does not (desired
> > > behavior).
> > >
> > > The list of pending jobs is protected by a lock (safe), the race is:
> > >
> > > add job to pending list
> > > run_job
> > > signal scheduled fence
> > >
> > > dequeue from pending list
> > > free_job
> > > update timestamp
> > >
> > > Once a job is on the pending list its timestamp can be accessed which
> > > can blow up if scheduled fence isn't signaled or more specifically unless
> > > DMA_FENCE_FLAG_TIMESTAMP_BIT is set.  
> 
> I'm a bit lost. How can this lead to a NULL deref? Timestamp is a
> ktime_t embedded in dma_fence, and finished/scheduled are both
> dma_fence objects embedded in drm_sched_fence. So, unless
> {job,next_job}->s_fence is NULL, or {job,next_job} itself is NULL, I
> don't really see where the NULL deref is. If s_fence is NULL, that means
> drm_sched_job_init() wasn't called (unlikely to be detected that late),
> or ->free_job()/drm_sched_job_cleanup() was called while the job was
> still in the pending list. I don't really see a situation where job
> could NULL to be honest.

I think the problem here was that a dma_fence' timestamp field is within a union
together with it's cb_list list_head [1]. If a timestamp is set before the fence
is actually signalled, dma_fence_signal_timestamp_locked() will access the
cb_list to run the particular callbacks registered to this dma_fence. However,
writing the timestap will overwrite this list_head since it's a union, hence
we'd try to dereference the timestamp while iterating the list.

[1] https://elixir.bootlin.com/linux/latest/source/include/linux/dma-fence.h#L87

> 
> While I agree that updating the timestamp before the fence has been
> flagged as signaled/timestamped is broken (timestamp will be
> overwritten when dma_fence_signal(scheduled) is called) I don't see a
> situation where it would cause a NULL/invalid pointer deref. So I
> suspect there's another race causing jobs to be cleaned up while
> they're still in the pending_list.
> 
> > 
> > Ah, that problem again. No that is actually quite harmless.
> > 
> > You just need to double check if the DMA_FENCE_FLAG_TIMESTAMP_BIT is 
> > already set and if it's not set don't do anything.
>