Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> writes: > On 04/25/2018 11:29 AM, Eric W. Biederman wrote: > >> Another issue is changing wait_event_killable to wait_event_timeout where I need >> to understand >> what TO value is acceptable for all the drivers using the scheduler, or maybe it >> should come as a property >> of drm_sched_entity. >> >> It would not surprise me if you could pick a large value like 1 second >> and issue a warning if that time outever triggers. It sounds like the >> condition where we wait indefinitely today is because something went >> wrong in the driver. > > We wait here for all GPU jobs in flight which belong to the dying entity to complete. The driver submits > the GPU jobs but the content of the job might be is not under driver's control and could take > long time to finish or even hang (e.g. graphic or compute shader) , I > guess that why originally the wait is indefinite. I am ignorant of what user space expect or what the semantics of the susbsystem are here, so I might be completely off base. But this wait for a long time behavior I would expect much more from f_op->flush or a f_op->fsync method. fsync so it could be obtained without closing the file descriptor. flush so that you could get a return value out to close. But I honestly don't know semantically what your userspace applications expect and/or require so I can really only say. Those of weird semantics. Eric