On Mon, Dec 20, 2021 at 08:25:05AM +0100, Christian König wrote: > Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky: > > This patchset is based on earlier work by Boris[1] that allowed to have an > > ordered workqueue at the driver level that will be used by the different > > schedulers to queue their timeout work. On top of that I also serialized > > any GPU reset we trigger from within amdgpu code to also go through the same > > ordered wq and in this way simplify somewhat our GPU reset code so we don't need > > to protect from concurrency by multiple GPU reset triggeres such as TDR on one > > hand and sysfs trigger or RAS trigger on the other hand. > > > > As advised by Christian and Daniel I defined a reset_domain struct such that > > all the entities that go through reset together will be serialized one against > > another. > > > > TDR triggered by multiple entities within the same domain due to the same reason will not > > be triggered as the first such reset will cancel all the pending resets. This is > > relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, > > those will still happen after the in flight resets finishes. > > > > [1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@xxxxxxxxxxxxx/ > > > > P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there. > > Patches #1 and #5, #6 are Reviewed-by: Christian König > <christian.koenig@xxxxxxx> > > Some minor comments on the rest, but in general absolutely looks like the > way we want to go. I only scrolled through quickly, but yeah I'm concurring. -Daniel > > Regards, > Christian. > > > > > Andrey Grodzovsky (6): > > drm/amdgpu: Init GPU reset single threaded wq > > drm/amdgpu: Move scheduler init to after XGMI is ready > > drm/amdgpu: Fix crash on modprobe > > drm/amdgpu: Serialize non TDR gpu recovery with TDRs > > drm/amdgpu: Drop hive->in_reset > > drm/amdgpu: Drop concurrent GPU reset protection for device > > > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- > > 7 files changed, 132 insertions(+), 136 deletions(-) > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch