I see. OK, I will add to myself a TODO about struct completion approach. Andrey On 10/30/19 11:00 AM, Koenig, Christian wrote: > Yeah, and exactly that's the problem :) You need a global lock covering > all schedulers. > > Otherwise you end up in hell's kitchen again with taking all those locks > in the right order. > > Christian. > > Am 30.10.19 um 15:56 schrieb Grodzovsky, Andrey: >> Can you elaborate on what is the tricky part with the lock ? I assumed >> we just use per scheduler lock. >> >> Andrey >> >> On 10/30/19 10:50 AM, Christian König wrote: >>> A lock inside the scheduler is rather tricky to implement. >>> >>> What you need to do is to get rid of the park()/unpark() hack in >>> drm_sched_entity_fini(). >>> >>> We could do this with a struct completion or convert the scheduler >>> from a thread to a work item. >>> >>> Regards, >>> Christian. >>> >>> Am 30.10.19 um 15:44 schrieb Grodzovsky, Andrey: >>>> That good as proof of RCA but I still think we should grab a dedicated >>>> lock inside scheduler since the race is internal to scheduler code so >>>> this better to handle it inside the scheduler code to make the fix apply >>>> for all drivers using it. >>>> >>>> Andrey >>>> >>>> On 10/30/19 4:44 AM, S, Shirish wrote: >>>>>>>> We still have it and isn't doing kthread_park()/unpark() from >>>>>>>> drm_sched_entity_fini while GPU reset in progress defeats all the >>>>>>>> purpose of drm_sched_stop->kthread_park ? If >>>>>>>> drm_sched_entity_fini-> kthread_unpark happens AFTER >>>>>>>> drm_sched_stop->kthread_park nothing prevents from another (third) >>>>>>>> thread keep submitting job to HW which will be picked up by the >>>>>>>> unparked scheduler thread try to submit to HW but fail because the >>>>>>>> HW ring is deactivated. >>>>>>>> >>>>>>>> If so maybe we should serialize calls to >>>>>>>> kthread_park/unpark(sched->thread) ? >>>>>>>> >>>>>>> Yeah, that was my thinking as well. Probably best to just grab the >>>>>>> reset lock before calling drm_sched_entity_fini(). >>>>>> Shirish - please try locking &adev->lock_reset around calls to >>>>>> drm_sched_entity_fini as Christian suggests and see if this actually >>>>>> helps the issue. >>>>>> >>>>> Yes that also works. >>>>> >>>>> Regards, >>>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx