On Fri, May 17, 2019 at 10:43 PM Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> wrote: > On 5/17/19 3:35 PM, Erico Nunes wrote: > > Lima currently defaults to an "infinite" timeout. Setting a 500ms > > default timeout like most other drm_sched users do fixed the leak for > > me. > > I am not very clear about the problem - so you basically never allow a > time out handler to run ? And then when the job hangs for ever you get > this memory leak ? How it worked for you before this refactoring ? As > far as I remember sched->ops->free_job before this change was called > from drm_sched_job_finish which is the work scheduled from HW fence > signaled callback - drm_sched_process_job so if your job hangs for ever > anyway and this work is never scheduled when your free_job callback was > called ? In this particular case, the jobs run successfully, nothing hangs. Lima currently specifies an "infinite" timeout to the drm scheduler, so if a job did did hang, it would hang forever, I suppose. But this is not the issue. If I understand correctly it worked well before the rework because the cleanup was triggered at the end of drm_sched_process_job independently on the timeout. What I'm observing now is that even when jobs run successfully, they are not cleaned by the drm scheduler because drm_sched_cleanup_jobs seems to give up based on the status of a timeout worker. I would expect the timeout value to only be relevant in error/hung job cases. I will probably set the timeout to a reasonable value anyway, I just posted here to report that this can possibly be a bug in the drm scheduler after that rework. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel