>-----Original Message----- >From: Koenig, Christian >Sent: Tuesday, August 14, 2018 4:38 PM >To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu/sriov: For sriov runtime, use kiq to do >invalidate tlb > >Am 14.08.2018 um 10:35 schrieb Deng, Emily: >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of >>> Christian König >>> Sent: Tuesday, August 14, 2018 4:32 PM >>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu/sriov: For sriov runtime, use kiq to >>> do invalidate tlb >>> >>> Am 14.08.2018 um 10:26 schrieb Deng, Emily: >>>>> -----Original Message----- >>>>> From: Christian König <ckoenig.leichtzumerken at gmail.com> >>>>> Sent: Tuesday, August 14, 2018 4:20 PM >>>>> To: Deng, Emily <Emily.Deng at amd.com>; Koenig, Christian >>>>> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org >>>>> Subject: Re: [PATCH] drm/amdgpu/sriov: For sriov runtime, use kiq >>>>> to do invalidate tlb >>>>> >>>>> Am 14.08.2018 um 10:13 schrieb Deng, Emily: >>>>>>> -----Original Message----- >>>>>>> From: Christian König <ckoenig.leichtzumerken at gmail.com> >>>>>>> Sent: Tuesday, August 14, 2018 3:54 PM >>>>>>> To: Deng, Emily <Emily.Deng at amd.com>; >>>>>>> amd-gfx at lists.freedesktop.org >>>>>>> Subject: Re: [PATCH] drm/amdgpu/sriov: For sriov runtime, use kiq >>>>>>> to do invalidate tlb >>>>>>> >>>>>>> Am 14.08.2018 um 09:46 schrieb Emily Deng: >>>>>>>> To avoid the tlb flush not interrupted by world switch, use kiq >>>>>>>> and one command to do tlb invalidate. >>>>>>> Well NAK, this just duplicates the TLB handling and moves it >>>>>>> outside of the GMC code. >>>>>> No, it not duplicates the TLB handling, it only send one command, >>>>>> and not duplicate. With kiq, it only use one command to do the >>>>>> invalidate tlb and >>>>> wait ack, and won't be interrupted by world switch. >>>>> >>>>> That is effectively duplicating the TLB handling. >>>>> >>>>>>> Instead just lower the timeout and suppress the warning when >>>>>>> SRIOV is >>>>> active. >>>>>> With the kiq to do tlb flush, no warning issue expected. >>>>> Either fix that issue thoughtfully or let us live with the warning message. >>>>> >>>> Sorry, I don't know your mean, with one command, it will have no any >>>> warning. It will wait the ack in the same command. >>>>> But please no more SRIOV workaround we can't test on bare metal and >>>>> break things with older firmware. >>>> It seems a workaround here, but it truly is not a workaround. I >>>> think it is doing >>> the right thing. >>>> As the new firmware support the one command, why we need use two >>>> commands to do the invalidate and wait ack, and we don't know when >>>> it will >>> sent the ack back? >>> >>> The problem is that you depend on new firmware here. >>> See firmware and driver are distributed separately and we got into >>> quite a bunch of problems because the ring_emit_reg_write_reg_wait() >>> isn't correctly implemented. >> So it uses sriov to distinguish this, as sriov always use the tip firmware. > >That assumption is not correct. You SRIOV users should use the tip firmware, >but there is no guarantee that they are actually doing it. > >> Do you think need to add the firmware version check here? > >Yes, please do so. Ok, I will send a v2 patch >> And also add to the below code? > >No, probably best approach would be to use the same code path for both bare >metal and SRIOV. But the bare metal doesn't have world switch, the SRIOV has, and don't know when it will be switched back to this VF. So I think for SRIOV, it will be better to use one command. And it have another issue that I think you already know what I am mentioning. > >Regards, >Christian. > >> Best wishes >> Emily Deng >> >>> See gfx_v9_0_ring_emit_reg_write_reg_wait as well: >>>> static void gfx_v9_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring >>>> *ring, >>>>                                                  uint32_t reg0, >>>> uint32_t reg1, >>>>                                                  uint32_t ref, >>>> uint32_t mask) { >>>>        int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX); >>>> >>>>        if (amdgpu_sriov_vf(ring->adev)) >>>>                gfx_v9_0_wait_reg_mem(ring, usepfp, 0, 1, reg0, >>>> reg1, >>>>                                      ref, mask, 0x20); >>>>        else >>>>                amdgpu_ring_emit_reg_write_reg_wait_helper(ring, >>>> reg0, reg1, >>>>                                                           ref, >>>> mask); } >>> Before we add any more broken workarounds like that please fix the >>> existing code to check for the for the firmware version and NOT if >>> SRIOV is present or not. >>> >>> Regards, >>> Christian. >>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Best wishes >>>>>> Emily Deng >>>>>>> Christian. >>>>>>> >>>>>>>> SWDEV-161497 >>>>>>>> >>>>>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 50 >>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++ >>>>>>>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 ++++ >>>>>>>> 3 files changed, 57 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>>>>>>> index 21adb1b6..aa6ddcc 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>>>>>>> @@ -233,6 +233,56 @@ void amdgpu_virt_kiq_wreg(struct >>>>> amdgpu_device >>>>>>> *adev, uint32_t reg, uint32_t v) >>>>>>>> pr_err("failed to write reg:%x\n", reg); >>>>>>>> } >>>>>>>> >>>>>>>> +void amdgpu_virt_kiq_invalidate_tlb(struct amdgpu_device *adev, >>>>>>>> +struct >>>>>>> amdgpu_vmhub *hub, >>>>>>>> + unsigned eng, u32 req, uint32_t vmid) { >>>>>>>> + signed long r, cnt = 0; >>>>>>>> + unsigned long flags; >>>>>>>> + uint32_t seq; >>>>>>>> + struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>>>>>> + struct amdgpu_ring *ring = &kiq->ring; >>>>>>>> + >>>>>>>> + BUG_ON(!ring->funcs->emit_reg_write_reg_wait); >>>>>>>> + >>>>>>>> + spin_lock_irqsave(&kiq->ring_lock, flags); >>>>>>>> + amdgpu_ring_alloc(ring, 32); >>>>>>>> + amdgpu_ring_emit_reg_write_reg_wait(ring, hub- >>vm_inv_eng0_req >>>>>>>> ++ >>>>>>> eng, >>>>>>>> + hub->vm_inv_eng0_ack + eng, >>>>>>>> + req, 1 << vmid); >>>>>>>> + amdgpu_fence_emit_polling(ring, &seq); >>>>>>>> + amdgpu_ring_commit(ring); >>>>>>>> + spin_unlock_irqrestore(&kiq->ring_lock, flags); >>>>>>>> + >>>>>>>> + r = amdgpu_fence_wait_polling(ring, seq, >MAX_KIQ_REG_WAIT); >>>>>>>> + >>>>>>>> + /* don't wait anymore for gpu reset case because this way may >>>>>>>> + * block gpu_recover() routine forever, e.g. this virt_kiq_rreg >>>>>>>> + * is triggered in TTM and ttm_bo_lock_delayed_workqueue() >will >>>>>>>> + * never return if we keep waiting in virt_kiq_rreg, which cause >>>>>>>> + * gpu_recover() hang there. >>>>>>>> + * >>>>>>>> + * also don't wait anymore for IRQ context >>>>>>>> + * */ >>>>>>>> + if (r < 1 && (adev->in_gpu_reset || in_interrupt())) >>>>>>>> + goto failed_kiq; >>>>>>>> + >>>>>>>> + if (in_interrupt()) >>>>>>>> + might_sleep(); >>>>>>>> + >>>>>>>> + while (r < 1 && cnt++ < MAX_KIQ_REG_TRY) { >>>>>>>> + msleep(MAX_KIQ_REG_BAILOUT_INTERVAL); >>>>>>>> + r = amdgpu_fence_wait_polling(ring, seq, >>>>>>> MAX_KIQ_REG_WAIT); >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (cnt > MAX_KIQ_REG_TRY) >>>>>>>> + goto failed_kiq; >>>>>>>> + >>>>>>>> + return; >>>>>>>> + >>>>>>>> +failed_kiq: >>>>>>>> + pr_err("failed to invalidate tlb with kiq\n"); } >>>>>>>> + >>>>>>>> /** >>>>>>>> * amdgpu_virt_request_full_gpu() - request full gpu access >>>>>>>> * @amdgpu: amdgpu device. >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>>>>>>> index 880ac11..a2e3c78 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>>>>>>> @@ -288,6 +288,8 @@ void amdgpu_free_static_csa(struct >>>>> amdgpu_device >>>>>>> *adev); >>>>>>>> void amdgpu_virt_init_setting(struct amdgpu_device *adev); >>>>>>>> uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, >>>>>>>> uint32_t >>>>> reg); >>>>>>>> void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, >>>>>>>> uint32_t reg, uint32_t v); >>>>>>>> +void amdgpu_virt_kiq_invalidate_tlb(struct amdgpu_device *adev, >>>>>>>> +struct >>>>>>> amdgpu_vmhub *hub, >>>>>>>> + unsigned eng, u32 req, uint32_t vmid); >>>>>>>> int amdgpu_virt_request_full_gpu(struct amdgpu_device >>>>>>>> *adev, bool >>> init); >>>>>>>> int amdgpu_virt_release_full_gpu(struct amdgpu_device >>>>>>>> *adev, bool >>> init); >>>>>>>> int amdgpu_virt_reset_gpu(struct amdgpu_device *adev); diff >>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>>>>>>> index ed467de..6a886d9 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>>>>>>> @@ -339,6 +339,11 @@ static void gmc_v9_0_flush_gpu_tlb(struct >>>>>>> amdgpu_device *adev, >>>>>>>> struct amdgpu_vmhub *hub = &adev->vmhub[i]; >>>>>>>> u32 tmp = gmc_v9_0_get_invalidate_req(vmid); >>>>>>>> >>>>>>>> + if (amdgpu_sriov_vf(adev) && >amdgpu_sriov_runtime(adev)) { >>>>>>>> + amdgpu_virt_kiq_invalidate_tlb(adev, hub, eng, >tmp, >>>>>>> vmid); >>>>>>>> + continue; >>>>>>>> + } >>>>>>>> + >>>>>>>> WREG32_NO_KIQ(hub->vm_inv_eng0_req + eng, tmp); >>>>>>>> >>>>>>>> /* Busy wait for ACK.*/ >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx