OK, I will update patches 1 and 2 and given your RBs push them since they fix some races. I will then update and test patch 3 on some basic scenarios and will send it for separate review where I might put a TODO comment in code with my objections regarding long jobs form our discussion so you can see and reply on that. Andrey On 01/24/2019 06:34 AM, Koenig, Christian wrote: > I see a few cleanups on Patch #3 which actually belong in patch #1: > >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct >> drm_sched_job *bad) > The "bad" job parameter actually isn't used any more, isn't it? > >> +retry_wait: > Not used any more. > > But apart from that at least patch #1 and #2 look like they can have my > rb now. > > Patch #3 looks also like it should work after a bit of polishing. > > Thanks, > Christian. > > Am 18.01.19 um 20:15 schrieb Grodzovsky, Andrey: >> Attached series is the first 2 patches we already discussed about ring >> mirror list handling racing with all your comments fixed (still not >> committed). The third patch is a prototype based on the first 2 patches >> and on our discussion. >> >> Please take a look. >> >> Andrey >> >> >> On 01/18/2019 01:32 PM, Koenig, Christian wrote: >>> Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey: >>>> On 01/18/2019 12:10 PM, Koenig, Christian wrote: >>>>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >>>>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>>>>>> [SNIP] >>>>>>>>>>> Re-arming the timeout should probably have a much reduced value >>>>>>>>>>> when the job hasn't changed. E.g. something like a few ms. >>>>>>>> Now i got thinking about non hanged job in progress (job A) and let's >>>>>>>> say it's a long job , it just started executing but due to time out of >>>>>>>> another job (job B) on another (or this scheduler) it's parent cb got >>>>>>>> disconnected, we disarmed the tdr timer for the job's scheduler, >>>>>>>> meanwhile the timed out job did manage to complete before HW reset >>>>>>>> check and hence we skip HW reset, attach back the cb and rearm job's A >>>>>>>> tdr timer with a future value of few ms only - aren't we going to get >>>>>>>> false tdr triggered on job B now because we didn't let it enough time >>>>>>>> to run and complete ? I would prefer the other extreme of longer time >>>>>>>> for time out to trigger then false TDR. Optimally we would have per >>>>>>>> job timer and rearm to exactly the reminder of it's time out value - >>>>>>>> but we gave up on per job tdr work long ago. >>>>>>> Well we only re-arm the timeout with a shorter period if it already >>>>>>> triggered once. If we just suspend the timeout then we should still use >>>>>>> the longer period. >>>>>> Can you explain more on this ? I don't get it. >>>>> See drm_sched_job_timedout(), we re-arm the timeout at the end of the >>>>> procedure. >>>>> >>>>> We should change that and re-arm the timer with a much lower timeout if >>>>> the job is still not finished. >>>>> >>>>> Christian. >>>> I still don't see how this can fix the problem of of long job in >>>> progress triggering false tdr if no HW reset was done, but maybe I am >>>> missing other pieces you have in mind, I will finish the patch and send >>>> it and then we can be more specific based on the code. >>> Ok sounds good. We should probably discuss less on details and prototype >>> a bit more. >>> >>> Might be that I'm missing something here as well, so probably good to >>> have some code to talk about things more directly. >>> >>> Christian. >>> >>>> Andrey >>>> >>>>>> Andrey >>>>>> >>>>>>>> In general the more i think about it (correct me if I am wrong) I am >>>>>>>> less sure how much the optimization feature is useful - if job's time >>>>>>>> out did trigger what are the chances that the little more time we give >>>>>>>> it between beginning of tdr function and the time we do start the >>>>>>>> actual HW reset will be exactly what it needed to complete. Also, this >>>>>>>> is still not water proof as the job might complete and signal it's HW >>>>>>>> fence exactly after we checked for completion but before starting the >>>>>>>> HW reset code. >>>>>>> I don't see this as an optimization, but rather as mandatory for correct >>>>>>> operation. >>>>>>> >>>>>>> See without this we can run into issues because we execute jobs multiple >>>>>>> times. That can still happen with this clean handling, but it is much >>>>>>> more unlikely. >>>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> Andrey >>>>>>>> >>>>>>>>>> By unchanged you mean when we didn't resubmit the job because of the >>>>>>>>>> optimized non HW reset, right ? >>>>>>>>> Correct, yes. >>>>>>>>> >>>>>>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like >>>>>>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we >>>>>>>>>>>> still need to take care of flushing all sced->work_tdr for a >>>>>>>>>>>> device and for all devices in hive for XGMI. >>>>>>>>>>>> What do you think ? >>>>>>>>>>> Why should that be necessary? We only wait for the delayed work to >>>>>>>>>>> make sure that the job is not destroyed while dealing with it. >>>>>>>>>>> >>>>>>>>>>> Christian. >>>>>>>>>> But we might not be waiting for the correct sched->work_tdr, we do >>>>>>>>>> the reset routine for all schedulers in a device accessing their >>>>>>>>>> jobs too and not only for the scheduler to which the job belongs. >>>>>>>>>> For XGMI not only that, we reset all the devices in the hive. >>>>>>>>> That is harmless you only need to wait for the work_tdr of the >>>>>>>>> current scheduler, not for all of them. >>>>>>>>> >>>>>>>>>> I was thinking, amdgpu driver is not even interested in allowing >>>>>>>>>> multiple sced->tdr to execute together - we have to serialize all of >>>>>>>>>> them anyway with the trylock mutex (even without XGMI), v3d in >>>>>>>>>> v3d_job_timedout seems also to reset all of his schedulers from the >>>>>>>>>> tdr work. Would it make sense to provide the sched->work_td as init >>>>>>>>>> parameter to scheduler (same one for all schedulers) so we can >>>>>>>>>> enforce serialization by disallowing more then 1 tdr work to execute >>>>>>>>>> in the same time ? Other drivers interested to do in parallel can >>>>>>>>>> provide unique sched->work_tdr per scheduler. This does imply >>>>>>>>>> drm_sched_job_timedout has to removed and delegated to specific >>>>>>>>>> driver implementation as probably other code dealing with >>>>>>>>>> sched->work_tdr... Maybe even move tdr handling to the driver all >>>>>>>>>> together ? >>>>>>>>> Yeah, I was thinking something similar. The problem with this >>>>>>>>> approach is that a delayed work item can have only one delay, but for >>>>>>>>> multiple engines we need multiple delays. >>>>>>>>> >>>>>>>>> What we could do is to make it a timer instead and raise the work >>>>>>>>> item from the device specific callback. >>>>>>>>> >>>>>>>>> But that doesn't really saves us the stop all schedulers trouble, so >>>>>>>>> it doesn't buy us much in the end if I see this correctly. >>>>>>>>> >>>>>>>>> Christian. >>>>>>> _______________________________________________ >>>>>>> amd-gfx mailing list >>>>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx