Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

"S, Shirish" <sshankar@xxxxxxx> · Wed, 30 Oct 2019 08:44:30 +0000

On 10/25/2019 9:32 PM, Grodzovsky, Andrey wrote:






On 10/25/19 11:57 AM, Koenig, Christian wrote:



Am 25.10.19 um 17:35 schrieb Grodzovsky, Andrey:






On 10/25/19 5:26 AM, Koenig, Christian wrote:



Am 25.10.19 um 11:22 schrieb S, Shirish:






On 10/25/2019 2:23 PM, Koenig, Christian wrote:





amdgpu_do_asic_reset starting to resume blocks


...


amdgpu 0000:03:00.0: couldn't schedule ib on ring <sdma0>

[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

...
amdgpu_device_ip_resume_phase2 resumed gfx_v9_0


amdgpu_device_ip_resume_phase2 resumed sdma_v4_0
amdgpu_device_ip_resume_phase2 resumed powerplay


This is what's the root of the problem.



The scheduler should never be resumed before we are done with bringing back the hardware into an usable state.



I dont see the scheduler being resumed when the ib is scheduled, its done way after the hardware is ready in reset code path.


Below are the logs:
amdgpu 0000:03:00.0: GPU reset begin!

amdgpu_device_gpu_recover calling drm_sched_stop             <==

...

amdgpu 0000:03:00.0: couldn't schedule ib on ring <sdma0>

[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

...

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

amdgpu_device_ip_resume_phase2 resumed dm

...

[drm] recover vram bo from shadow done

amdgpu_device_gpu_recover calling  drm_sched_start         <==

...
As mentioned in the call trace, drm_sched_main() is responsible for this job_run which seems to be called during cleanup.



Then the scheduler isn't stopped for some reason and we need to investigate why.



We used to have another kthread_park()/unpark() in drm_sched_entity_fini(), maybe an application is crashing while we are trying to reset the GPU?





We still have it and isn't doing kthread_park()/unpark() from drm_sched_entity_fini while GPU reset in progress defeats all the purpose of drm_sched_stop->kthread_park ? If drm_sched_entity_fini-> kthread_unpark happens AFTER drm_sched_stop->kthread_park
 nothing prevents from another (third) thread keep submitting job to HW which will be picked up by the unparked scheduler thread try to submit to HW but fail because the HW ring is deactivated.



If so maybe we should serialize calls to kthread_park/unpark(sched->thread) ?





Yeah, that was my thinking as well. Probably best to just grab the reset lock before calling drm_sched_entity_fini().





Shirish - please try locking &adev->lock_reset around calls to drm_sched_entity_fini as Christian suggests and see if this actually helps the issue.

Yes that also works.
Regards,
Shirish S



Andrey





Alternative I think we could change the kthread_park/unpark to a wait_event_.... in drm_sched_entity_fini().



Regards,

Christian.




Andrey





Would be rather unlikely, especially that would be rather hard to reproduce but currently my best bet what's going wrong here.



Regards,

Christian.




Regards,
Shirish S





Regards,

Christian.



Am 25.10.19 um 10:50 schrieb S, Shirish:



Here is the call trace:

Call Trace:

 dump_stack+0x4d/0x63

 amdgpu_ib_schedule+0x86/0x4b7

 ? __mod_timer+0x21e/0x244

 amdgpu_job_run+0x108/0x178

 drm_sched_main+0x253/0x2fa

 ? remove_wait_queue+0x51/0x51

 ? drm_sched_cleanup_jobs.part.12+0xda/0xda

 kthread+0x14f/0x157

 ? kthread_park+0x86/0x86

 ret_from_fork+0x22/0x40

amdgpu 0000:03:00.0: couldn't schedule ib on ring <sdma0>

[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)





printed via below change:

@@ -151,6 +152,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,

        }



        if (!ring->sched.ready) {

+              dump_stack();

                dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", ring->name);

                return -EINVAL;





On 10/24/2019 10:00 PM, Christian König wrote:



Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:






On 10/24/19 7:01 AM, Christian König wrote:



Am 24.10.19 um 12:58 schrieb S, Shirish: 

[Why] 

Upon GPU reset, kernel cleans up already submitted jobs 

via drm_sched_cleanup_jobs. 

This schedules ib's via drm_sched_main()->run_job, leading to 

race condition of rings being ready or not, since during reset 

rings may be suspended. 




NAK, exactly that's what should not happen. 



The scheduler should be suspend while a GPU reset is in progress. 



So you are running into a completely different race here. 




Below is the series of events when the issue occurs.
(Note that as you & Andrey mentioned the scheduler has been suspended but the job is scheduled nonetheless.)



amdgpu 0000:03:00.0: GPU reset begin!
...
amdgpu_device_gpu_recover stopping ring sdma0 via drm_sched_stop
...
amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
amdgpu_do_asic_reset starting to resume blocks


...


amdgpu 0000:03:00.0: couldn't schedule ib on ring <sdma0>

[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

...
amdgpu_device_ip_resume_phase2 resumed gfx_v9_0


amdgpu_device_ip_resume_phase2 resumed sdma_v4_0
amdgpu_device_ip_resume_phase2 resumed powerplay


...



FWIW, since the job is always NULL when "drm_sched_stop(&ring->sched, job ? &job->base : NULL);" when called during reset, all drm_sched_stop() does
is  cancel delayed work and park the sched->thread. There is no job list to be iterated to de-activate or remove or update fences.
Based on all this analysis, adding a mutex is more failsafe and less intrusive in the current code flow and lastly seems to be logical as well, hence I devised this approach







Please sync up with Andrey how this was able to happen. 



Regards, 

Christian. 





Shirish - Christian makes a good point - note that in amdgpu_device_gpu_recover drm_sched_stop which stop all the scheduler threads is called way before we suspend the HW in amdgpu_device_pre_asic_reset->amdgpu_device_ip_suspend where SDMA suspension is
 happening and where the HW ring marked as not ready - please provide call stack for when you hit

[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22) to identify the code path which tried to submit the SDMA IB





Well the most likely cause of this is that the hardware failed to resume after the reset.


Infact hardware resume has not yet started, when the job is scheduled, which is the race am trying to address with this patch.
Regards,
Shirish S





Christian.





Andrey








[How] 

make GPU reset's amdgpu_device_ip_resume_phase2() & 

amdgpu_ib_schedule() in amdgpu_job_run() mutually exclusive. 



Signed-off-by: Shirish S 
<shirish.s@xxxxxxx> 

--- 

  drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 1 + 

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ 

  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 2 ++ 

  3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h


index f4d9041..7b07a47b 100644 

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h 

@@ -973,6 +973,7 @@ struct amdgpu_device { 

      bool                            in_gpu_reset; 

      enum pp_mp1_state               mp1_state; 

      struct mutex  lock_reset; 

+    struct mutex  lock_ib_sched; 

      struct amdgpu_doorbell_index doorbell_index; 

        int asic_reset_res; 

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c


index 676cad1..63cad74 100644 

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 

@@ -2759,6 +2759,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, 

      mutex_init(&adev->virt.vf_errors.lock); 

      hash_init(adev->mn_hash); 

      mutex_init(&adev->lock_reset); 

+    mutex_init(&adev->lock_ib_sched); 

      mutex_init(&adev->virt.dpm_mutex); 

      mutex_init(&adev->psp.mutex); 

  @@ -3795,7 +3796,9 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,


                  if (r) 

                      return r; 

  +                mutex_lock(&tmp_adev->lock_ib_sched); 

                  r = amdgpu_device_ip_resume_phase2(tmp_adev); 

+                mutex_unlock(&tmp_adev->lock_ib_sched); 

                  if (r) 

                      goto out; 

  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c


index e1bad99..cd6082d 100644 

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 

@@ -233,8 +233,10 @@ static struct dma_fence *amdgpu_job_run(struct drm_sched_job *sched_job)


      if (finished->error < 0) { 

          DRM_INFO("Skip scheduling IBs!\n"); 

      } else { 

+        mutex_lock(&ring->adev->lock_ib_sched); 

          r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job, 

                         &fence); 

+        mutex_unlock(&ring->adev->lock_ib_sched); 

          if (r) 

              DRM_ERROR("Error scheduling IBs (%d)\n", r); 

      } 








_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




















_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx