Re: [PATCH v2 1/2] drm/amdgpu: enhance amdgpu_vcn_suspend

James Zhu <jamesz@xxxxxxx> · Mon, 17 May 2021 13:59:33 -0400



    Then we forgot the proposal I provided before.
    I think the below seq may fixed the race condition issue that we
      are facing.

    
    1. stop scheduling new jobs 

    
        for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {

              if (adev->vcn.harvest_config & (1 << i))

                  continue;

      
              ring = &adev->vcn.inst[i].ring_dec;

              ring->sched.ready = false;

      
              for (j = 0; j < adev->vcn.num_enc_rings; ++j) {

                  ring = &adev->vcn.inst[i].ring_enc[j];

                  ring->sched.ready = false;

              }

          }
    2.    cancel_delayed_work_sync(&adev->vcn.idle_work);
    3.    SOC15_WAIT_ON_RREG(VCN,
          inst_idx, mmUVD_POWER_STATUS, 1,

                   UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
    4.    amdgpu_device_ip_set_powergating_state(adev,
      AMD_IP_BLOCK_TYPE_VCN,   AMD_PG_STATE_GATE);
    5.  saved_bo
    Best Regards!
    James

    
    On 2021-05-17 1:43 p.m., Leo Liu wrote:

    
      On 2021-05-17 12:54 p.m., James Zhu wrote:
      

      I am wondering if there are still some
        jobs kept in the queue, it is lucky to check
        

      Yes it's possible, in this case delayed handler is set, so
      cancelling once is enough.
      

        UVD_POWER_STATUS done, but after, fw start a new job that list
        in the queue.
        

        To handle this situation perfectly, we need add mechanism to
        suspend fw first.
        

      I think that should be handled by the sequence from
      vcn_v3_0_stop_dpg_mode().
      

        Another case, if it is unlucky, that  vcn fw hung at that time,
        UVD_POWER_STATUS
        

        always keeps busy.   then it needs force powering gate the vcn
        hw after certain time waiting.
        

      Yep, we still need to gate VCN power after certain timeout.
      

      Regards,
      

      Leo
      

        Best Regards!
        

        James
        

        On 2021-05-17 12:34 p.m., Leo Liu wrote:
        

          On 2021-05-17 11:52 a.m., James Zhu wrote:
          

          During vcn suspends, stop ring
            continue to receive new requests,
            

            and try to wait for all vcn jobs to finish gracefully.
            

            v2: Forced powering gate vcn hardware after few wainting
            retry.
            

            Signed-off-by: James Zhu <James.Zhu@xxxxxxx>
            

            ---
            

              drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22
            +++++++++++++++++++++-
            

              1 file changed, 21 insertions(+), 1 deletion(-)
            

            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            

            index 2016459..9f3a6e7 100644
            

            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            

            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            

            @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct
            amdgpu_device *adev)
            

              {
            

                  unsigned size;
            

                  void *ptr;
            

            +    int retry_max = 6;
            

                  int i;
            

              -   
            cancel_delayed_work_sync(&adev->vcn.idle_work);
            

            +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
            

            +        if (adev->vcn.harvest_config & (1 <<
            i))
            

            +            continue;
            

            +        ring = &adev->vcn.inst[i].ring_dec;
            

            +        ring->sched.ready = false;
            

            +
            

            +        for (j = 0; j < adev->vcn.num_enc_rings; ++j)
            {
            

            +            ring = &adev->vcn.inst[i].ring_enc[j];
            

            +            ring->sched.ready = false;
            

            +        }
            

            +    }
            

            +
            

            +    while (retry_max-- &&
            cancel_delayed_work_sync(&adev->vcn.idle_work))
            

            +        mdelay(5);
            

          I think it's possible to have one pending job unprocessed with
          VCN when suspend sequence getting here, but it shouldn't be
          more than one, cancel_delayed_work_sync probably return false
          after the first time, so calling cancel_delayed_work_sync once
          should be enough here. we probably need to wait longer from:
          

          SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
          

                  UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
          

          to make sure the unprocessed job get done.
          

          Regards,
          

          Leo
          

          +    if (!retry_max &&
            !amdgpu_sriov_vf(adev)) {
            

            +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
            

            +            dev_warn(adev->dev, "Forced powering gate
            vcn hardware!");
            

            +            vcn_v3_0_set_powergating_state(adev,
            AMD_PG_STATE_GATE);
            

            +        }
            

            +    }
            

                    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
            

                      if (adev->vcn.harvest_config & (1 <<
            i))
            

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx