+CC Thomas Hellström On Mon, 2024-11-18 at 12:48 +0100, Christian König wrote: > Am 15.11.24 um 17:07 schrieb Philipp Stanner: > > > > > On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote: > > > > > > > > Am 15.11.24 um 14:55 schrieb Philipp Stanner: > > > > > > > > > > > > > > [SNIP] > > > > > > > > > > > > > > > > > > > > But you wouldn't want to aim for getting rid of struct > > > > drm_sched_job > > > > completely, or would you? > > > > > > > > > > > > > > > > > No, the drm_sched_job structure was added to aid the single > > > producer > > > single consumer queue and so made it easier to put the jobs into > > > a > > > container. > > > > > > > > > In my mind the full solution for running jobs looks like this: > > > > > > 1. Driver enqueues with drm_sched_job_arm() > > > 2. job ends up in pending_list > > > 3. Sooner or later scheduler calls run_job() > > > 4. In return scheduler gets a dma_fence representing the > > > resulting > > > hardware operation. > > > 5, This fence is put into a container to keep around what the hw > > > is > > > actually executing. > > > 6. At some point the fence gets signaled informing the scheduler > > > that the next job can be pushed if enough credits are now > > > available. > > > > > > There is no free_job callback any more because after run_job is > > > called the scheduler is done with the job and only the dma_fence > > > which represents the actually HW operation is the object the > > > scheduler now works with. > > > > > > We would still need something like a kill_job callback which is > > > used > > > when an entity is released without flushing all jobs (see > > > drm_sched_entity_kill()), but that is then only used for a > > > specific > > > corner case. > > > > > > > Can't we just limit the scheduler's responsibility to telling the > > driver that it has to flush, and if it doesn't it's a bug? > > > > We still need to remove the pending jobs from the scheduler if > flushing times out. > > Without timing out flushing and/or aborting when the process was > killed we run into the same problem again that we don't want ti block > on _fini(). > > > > > > > > > Blocking for the cleanup in drm_sched_fini() then becomes a > > > trivial > > > dma_fence_wait() on the remaining unsignaled HW fences and > > > eventually > > > on the latest killed fence. > > > > > > > But that results in exactly the same situation as my waitque > > solution, > > doesn't it? > > > > The key part is that dma_fence's have a documented requirement to > never block infinitely. > > > > > > Blocking infinitely in drm_sched_fini(): > > > > If the driver doesn't guarantee that all fences will get signaled, > > then > > wait_event() in the waitque solution will block forever. The same > > with > > dma_fence_wait(). > > > > Or are you aiming at an interruptible dma_fence_wait(), thereby not > > blocking SIGKILL? > > > > That is basically what drm_sched_entity_flush() is already doing. > > > > > > That then would still result in leaks. So basically the same > > situation > > as with an interruptible drm_sched_flush() function. > > > > (Although I agree that would probably be more elegant) > > > > If the wait_event really would just waiting for dma_fences then yes. > > The problem with the waitqueue approach is that we need to wait for > the free_job callback and that callback is normally called from a > work item without holding any additional locks. > > When drm_sched_fini starts to block for that we create a rat-tail of > new dependencies since that one is most likely called from a file > descriptor destruction. > > > > > > > > > > > > > > > > > The problem with this approach is that the idea of re-submitting > > > jobs in a GPU reset by the scheduler is then basically dead. But > > > to > > > be honest that never worked correctly. > > > > > > See the deprecated warning I already put on > > > drm_sched_resubmit_jobs(). The job lifetime is not really well > > > defined because of this, see the free_guilty hack as well. > > > > > > > drm_sched_resubmit_jobs() is being used by 5 drivers currently. So > > if > > we go for this approach we have to port them, first. That doesn't > > sound > > trivial to me > > > > Yeah, completely agree. I think the scheduler should provide > something like drm_sched_for_each_pending_fence() which helps to > iterate over all the pending HW fences. > > The individual drivers could then device by themself what to do, > e.g. upcast the HW fence into the job and push it again or similar. I have talked with Dave and we would be interested in exploring the direction of getting rid of backend_ops.free_job() and doing the modifications this implies. Matthew, Thomas, any hard NACKs on principle, or can we look into this in a future patch series? P. > > But what we really need to get away from are those fence pre- > requisite violations Sima pointed out. For example we can't allocate > memory for run_job. > > Regards, > Christian. > > > > > > > > > > P. > > > > > > > > > > > > > Regards, > > > Christian. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Grüße, > > > > P. > > > > > > > > > > > > > > > > > > > > > > > > >