On Fri, Mar 18, 2022 at 9:04 AM Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> wrote: > > > On 2022-03-17 16:35, Rob Clark wrote: > > On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky > > <andrey.grodzovsky@xxxxxxx> wrote: > >> > >> On 2022-03-17 14:25, Rob Clark wrote: > >>> On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky > >>> <andrey.grodzovsky@xxxxxxx> wrote: > >>>> On 2022-03-17 13:35, Rob Clark wrote: > >>>>> On Thu, Mar 17, 2022 at 9:45 AM Christian König > >>>>> <christian.koenig@xxxxxxx> wrote: > >>>>>> Am 17.03.22 um 17:18 schrieb Rob Clark: > >>>>>>> On Thu, Mar 17, 2022 at 9:04 AM Christian König > >>>>>>> <christian.koenig@xxxxxxx> wrote: > >>>>>>>> Am 17.03.22 um 16:10 schrieb Rob Clark: > >>>>>>>>> [SNIP] > >>>>>>>>> userspace frozen != kthread frozen .. that is what this patch is > >>>>>>>>> trying to address, so we aren't racing between shutting down the hw > >>>>>>>>> and the scheduler shoveling more jobs at us. > >>>>>>>> Well exactly that's the problem. The scheduler is supposed to shoveling > >>>>>>>> more jobs at us until it is empty. > >>>>>>>> > >>>>>>>> Thinking more about it we will then keep some dma_fence instance > >>>>>>>> unsignaled and that is and extremely bad idea since it can lead to > >>>>>>>> deadlocks during suspend. > >>>>>>> Hmm, perhaps that is true if you need to migrate things out of vram? > >>>>>>> It is at least not a problem when vram is not involved. > >>>>>> No, it's much wider than that. > >>>>>> > >>>>>> See what can happen is that the memory management shrinkers want to wait > >>>>>> for a dma_fence during suspend. > >>>>> we don't wait on fences in shrinker, only purging or evicting things > >>>>> that are already ready. Actually, waiting on fences in shrinker path > >>>>> sounds like a pretty bad idea. > >>>>> > >>>>>> And if you stop the scheduler they will just wait forever. > >>>>>> > >>>>>> What you need to do instead is to drain the scheduler, e.g. call > >>>>>> drm_sched_entity_flush() with a proper timeout for each entity you have > >>>>>> created. > >>>>> yeah, it would work to drain the scheduler.. I guess that might be the > >>>>> more portable approach as far as generic solution for suspend. > >>>>> > >>>>> BR, > >>>>> -R > >>>> I am not sure how this drains the scheduler ? Suppose we done the > >>>> waiting in drm_sched_entity_flush, > >>>> what prevents someone to push right away another job into the same > >>>> entity's queue right after that ? > >>>> Shouldn't we first disable further pushing of jobs into entity before we > >>>> wait for sched->job_scheduled ? > >>>> > >>> In the system suspend path, userspace processes will have already been > >>> frozen, so there should be no way to push more jobs to the scheduler, > >>> unless they are pushed from the kernel itself. > >>> amdgpu_device_suspend > >> > >> It was my suspicion but I wasn't sure about it. > >> > >> > >>> We don't do that in > >>> drm/msm, but maybe you need to to move things btwn vram and system > >>> memory? > >> > >> Exactly, that was my main concern - if we use this method we have to use > >> it in a point in > >> suspend sequence when all the in kernel job submissions activity already > >> suspended > >> > >>> But even in that case, if the # of jobs you push is bounded I > >>> guess that is ok? > >> Submissions to scheduler entities are using unbounded queue, the bounded > >> part is when > >> you extract next job from entity to submit to HW ring and it rejects if > >> submission limit reached (drm_sched_ready) > >> > >> In general - It looks to me at least that what we what we want her is > >> more of a drain operation then flush (i.e. > >> we first want to disable any further job submission to entity's queue > >> and then flush all in flight ones). As example > >> for this i was looking at flush_workqueue vs. drain_workqueue > > Would it be possible for amdgpu to, in the system suspend task, > > > > 1) first queue up all the jobs needed to migrate bos out of vram, and > > whatever other housekeeping jobs are needed > > 2) then drain gpu scheduler's queues > > 3) and then finally wait for jobs executing on GPU to complete > > > We already do most of it in amdgpu_device_suspend, > amdgpu_device_ip_suspend_phase1 > followed by amdgpu_device_evict_resources followed by > amdgpu_fence_driver_hw_fini is > exactly steps 1 + 3. What we are missing is step 2). For this step I > suggest adding a function > called drm_sched_entity_drain which basically sets entity->stopped = > true and then calls drm_sched_entity_flush. > This will both reject any new insertions into entity's job queue and > will flush all pending job submissions to HW from that entity. > One point is we need to make make drm_sched_entity_push_job return value > so the caller knows about job enqueue > rejection. Hmm, seems like job enqueue that is rejected because we are in the process of suspending should be more of a WARN_ON() sort of thing? Not sure if there is something sensible to do for the caller at that point? > > What about runtime suspend ? I guess same issue with scheduler racing > against HW susppend is relevant there ? Runtime suspend should be ok, as long as the driver holds a runpm reference whenever the hw needs to be awake. The problem with system suspend (at least if you are using pm_runtime_force_suspend() or doing something equivalent) is that it bypasses the runpm reference. (Which, IMO, seems like a bad design..) > Also, could you point to a particular buggy scenario where the race > between SW shceduler and suspend is causing a problem ? I wrote a piglit test[1] to try to trigger this scenario.. it isn't really that easy to hit BR, -R [1] https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/643 > Andrey > > > > > > BR, > > -R > > > >> Andrey > >> > >> > >>> BR, > >>> -R