That good as proof of RCA but I still think we should grab a dedicated lock inside scheduler since the race is internal to scheduler code so this better to handle it inside the scheduler code to make the fix apply for all drivers using it. Andrey On 10/30/19 4:44 AM, S, Shirish wrote: >>>> >>>> We still have it and isn't doing kthread_park()/unpark() from >>>> drm_sched_entity_fini while GPU reset in progress defeats all the >>>> purpose of drm_sched_stop->kthread_park ? If >>>> drm_sched_entity_fini-> kthread_unpark happens AFTER >>>> drm_sched_stop->kthread_park nothing prevents from another (third) >>>> thread keep submitting job to HW which will be picked up by the >>>> unparked scheduler thread try to submit to HW but fail because the >>>> HW ring is deactivated. >>>> >>>> If so maybe we should serialize calls to >>>> kthread_park/unpark(sched->thread) ? >>>> >>> >>> Yeah, that was my thinking as well. Probably best to just grab the >>> reset lock before calling drm_sched_entity_fini(). >> >> >> Shirish - please try locking &adev->lock_reset around calls to >> drm_sched_entity_fini as Christian suggests and see if this actually >> helps the issue. >> > Yes that also works. > > Regards, > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx