On Tue, Sep 12, 2023 at 03:27:05PM +0200, Boris Brezillon wrote: > On Fri, 25 Aug 2023 15:45:49 +0200 > Christian König <christian.koenig@xxxxxxx> wrote: > > > >>>> I tried this patch with Nouveau and found a race condition: > > >>>> > > >>>> In drm_sched_run_job_work() the job is added to the pending_list via > > >>>> drm_sched_job_begin(), then the run_job() callback is called and the scheduled > > >>>> fence is signaled. > > >>>> > > >>>> However, in parallel drm_sched_get_cleanup_job() might be called from > > >>>> drm_sched_free_job_work(), which picks the first job from the pending_list and > > >>>> for the next job on the pending_list sets the scheduled fence' timestamp field. > > >> Well why can this happen in parallel? Either the work items are scheduled to > > >> a single threaded work queue or you have protected the pending list with > > >> some locks. > > >> > > > Xe uses a single-threaded work queue, Nouveau does not (desired > > > behavior). > > > > > > The list of pending jobs is protected by a lock (safe), the race is: > > > > > > add job to pending list > > > run_job > > > signal scheduled fence > > > > > > dequeue from pending list > > > free_job > > > update timestamp > > > > > > Once a job is on the pending list its timestamp can be accessed which > > > can blow up if scheduled fence isn't signaled or more specifically unless > > > DMA_FENCE_FLAG_TIMESTAMP_BIT is set. > > I'm a bit lost. How can this lead to a NULL deref? Timestamp is a > ktime_t embedded in dma_fence, and finished/scheduled are both > dma_fence objects embedded in drm_sched_fence. So, unless > {job,next_job}->s_fence is NULL, or {job,next_job} itself is NULL, I > don't really see where the NULL deref is. If s_fence is NULL, that means > drm_sched_job_init() wasn't called (unlikely to be detected that late), > or ->free_job()/drm_sched_job_cleanup() was called while the job was > still in the pending list. I don't really see a situation where job > could NULL to be honest. I think the problem here was that a dma_fence' timestamp field is within a union together with it's cb_list list_head [1]. If a timestamp is set before the fence is actually signalled, dma_fence_signal_timestamp_locked() will access the cb_list to run the particular callbacks registered to this dma_fence. However, writing the timestap will overwrite this list_head since it's a union, hence we'd try to dereference the timestamp while iterating the list. [1] https://elixir.bootlin.com/linux/latest/source/include/linux/dma-fence.h#L87 > > While I agree that updating the timestamp before the fence has been > flagged as signaled/timestamped is broken (timestamp will be > overwritten when dma_fence_signal(scheduled) is called) I don't see a > situation where it would cause a NULL/invalid pointer deref. So I > suspect there's another race causing jobs to be cleaned up while > they're still in the pending_list. > > > > > Ah, that problem again. No that is actually quite harmless. > > > > You just need to double check if the DMA_FENCE_FLAG_TIMESTAMP_BIT is > > already set and if it's not set don't do anything. >