Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu,  8 Jun 2023 08:55:51 +0200
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> to wait on all the external dependencies (those added to
> drm_sched_job::dependencies) before signaling the job finished fence.
> This is done this way to prevent jobs depending on these canceled jobs
> from considering the resources they want to access as ready, when
> they're actually still used by other components, thus leading to
> potential data corruptions.
> 
> The problem is, the kill_jobs logic is omitting the last fence popped
> from the dependencies array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
> 
> This is an attempt at fixing that, but also an opportunity to make sure
> I understood the drm_sched_entity_kill(), because I'm not 100% sure if

               ^ the drm_sched_entity_kill() logic correctly, ...

> skipping this currently popped dependency was intentional or not. I can't
> see a good reason why we'd want to skip that one, but I might be missing
> something.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> Cc: Frank Binns <frank.binns@xxxxxxxxxx>
> Cc: Sarah Walker <sarah.walker@xxxxxxxxxx>
> Cc: Donald Robson <donald.robson@xxxxxxxxxx>
> Cc: Luben Tuikov <luben.tuikov@xxxxxxx>
> Cc: David Airlie <airlied@xxxxxxxxx>
> Cc: Daniel Vetter <daniel@xxxxxxxx>
> Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx>
> Cc: "Christian König" <christian.koenig@xxxxxxx>
> ---
> Stumbled across this issue while working on native dependency support
> with Donald (which will be posted soon). Flagged as RFC, because I'm
> not sure this is legit, and also not sure we want to fix it this way.
> I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> because of the asynchronousity of the wait, and the fact we use
> drm_sched_entity::dependency to know if we have a clear_dep()
> callback registered, so we can easily reset it without removing the

                          ^ we can't ...

> callback.




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux