Hi Frank, well that explains why we do it, but not the background. Anyway feel free to add an Acked-by: Christian König <christian.koenig at amd.com> to the patch and commit it. Regards, Christian. Am 29.11.2017 um 04:10 schrieb Min, Frank: > Hi Christian, > I have talked with hw team for the reason why adding the masks. the answer is "bits 24-27 of the VCE_VCPU_CACHE_OFFSETx registers should be set to the cache window # (0 for window 0, 1 for window 1, etc.)" > > Best Regards, > Frank > > -----é?®ä»¶å??件----- > å??件人: Min, Frank > å??é??æ?¶é?´: 2017å¹´11æ??23æ?¥ 12:08 > æ?¶ä»¶äºº: Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; Liu, Leo <Leo.Liu at amd.com> > 主é¢?: RE: [PATCH] drm/amdgpu: correct vce4.0 fw config for SRIOV (V2) > > Hi Leo, > Would you please comments on Christian's questions? > > Best Regards, > Frank > > -----Original Message----- > From: Min, Frank > Sent: Wednesday, November 22, 2017 4:04 PM > To: Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; Liu, Leo <Leo.Liu at amd.com> > Subject: RE: [PATCH] drm/amdgpu: correct vce4.0 fw config for SRIOV (V2) > > Hi Christian, > Thanks again for your review. > > And for the mask change my understanding is it is to be used for mark different part of fw (1<<24 is for stack and 2<<24 is for the data). > And more detailed background would need Leo give us. > > Best Regards, > Frank > > -----Original Message----- > From: Koenig, Christian > Sent: Wednesday, November 22, 2017 3:57 PM > To: Min, Frank <Frank.Min at amd.com>; amd-gfx at lists.freedesktop.org; Liu, Leo <Leo.Liu at amd.com> > Subject: Re: [PATCH] drm/amdgpu: correct vce4.0 fw config for SRIOV (V2) > > Hi Frank, > > thanks, the patch looks much better now. > >> The masks using is to programming the stack and data part for vce fw. And this part of code is borrowed from the non-sriov sequences. > In this case Leo can you explain this strange masks used for the > VCE_VCPU_CACHE_OFFSET* registers? > >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET0), >> - offset & 0x7FFFFFFF); >> + offset & ~0x0f000000); > ... >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET1), >> - offset & 0x7FFFFFFF); >> + (offset & ~0x0f000000) | (1 << 24)); > Using ~0x0f000000 looks really odd here and what should the "| (1 << 24)" part be about? > > Thanks, > Christian. > > Am 22.11.2017 um 06:11 schrieb Min, Frank: >> Hi Christian, >> Patch updated according to your suggestions. >> The masks using is to programming the stack and data part for vce fw. And this part of code is borrowed from the non-sriov sequences. >> >> Best Regards, >> Frank >> >> 1. program vce 4.0 fw with 48 bit address 2. correct vce 4.0 fw stack >> and date offset >> >> Change-Id: Ic1bc49c21d3a90c477d11162f9d6d9e2073fbbd3 >> Signed-off-by: Frank Min <Frank.Min at amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 38 +++++++++++++++++++++++------------ >> 1 file changed, 25 insertions(+), 13 deletions(-) mode change >> 100644 => 100755 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >> old mode 100644 >> new mode 100755 >> index 7574554..024a1be >> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >> @@ -243,37 +243,49 @@ static int vce_v4_0_sriov_start(struct amdgpu_device *adev) >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> mmVCE_LMI_VM_CTRL), 0); >> >> if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >> - adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >> - adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >> adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR0), >> + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 40) & >> +0xff); >> } else { >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >> adev->vce.gpu_addr >> 8); >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR0), >> + (adev->vce.gpu_addr >> 40) & 0xff); >> + } >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >> adev->vce.gpu_addr >> 8); >> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR1), >> + (adev->vce.gpu_addr >> 40) & 0xff); >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >> adev->vce.gpu_addr >> 8); >> - } >> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR2), >> + (adev->vce.gpu_addr >> 40) & 0xff); >> >> offset = AMDGPU_VCE_FIRMWARE_OFFSET; >> size = VCE_V4_0_FW_SIZE; >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET0), >> - offset & 0x7FFFFFFF); >> + offset & ~0x0f000000); >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> mmVCE_VCPU_CACHE_SIZE0), size); >> >> - offset += size; >> + offset = (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) ? offset >> ++ size : 0; >> size = VCE_V4_0_STACK_SIZE; >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET1), >> - offset & 0x7FFFFFFF); >> + (offset & ~0x0f000000) | (1 << 24)); >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> mmVCE_VCPU_CACHE_SIZE1), size); >> >> offset += size; >> size = VCE_V4_0_DATA_SIZE; >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET2), >> - offset & 0x7FFFFFFF); >> + (offset & ~0x0f000000) | (2 << 24)); >> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >> mmVCE_VCPU_CACHE_SIZE2), size); >> >> MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >> mmVCE_LMI_CTRL2), ~0x100, 0); >> -- >> 1.9.1 >> >> -----é?®ä»¶å??件----- >> å??件人: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] >> å??é??æ?¶é?´: 2017å¹´11æ??21æ?¥ 22:45 >> æ?¶ä»¶äºº: Min, Frank <Frank.Min at amd.com>; amd-gfx at lists.freedesktop.org >> 主é¢?: Re: [PATCH] drm/amdgpu: correct vce4.0 fw config for SRIOV (V2) >> >> Am 21.11.2017 um 11:23 schrieb Frank Min: >>> 1. program vce 4.0 fw with 48 bit address 2. correct vce 4.0 fw stack >>> and date offset >>> >>> Change-Id: I835f3f52f3b29f996812a3948aabede9f2d9b056 >>> Signed-off-by: Frank Min <Frank.Min at amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 97 ++++++++++++++++++++++------------- >>> 1 file changed, 62 insertions(+), 35 deletions(-) >>> mode change 100644 => 100755 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >>> b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >>> old mode 100644 >>> new mode 100755 >>> index 7574554..dc7b615 >>> --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c >>> @@ -243,59 +243,86 @@ static int vce_v4_0_sriov_start(struct amdgpu_device *adev) >>> MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> mmVCE_LMI_VM_CTRL), 0); >>> >>> if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >>> - adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >>> - adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >>> adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 8); >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR0), >>> + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCE].mc_addr >> 40) & >>> +0xff); >>> } else { >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR0), >>> adev->vce.gpu_addr >> 8); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR0), >>> + (adev->vce.gpu_addr >> 40) & 0xff); >>> + } >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR1), >>> adev->vce.gpu_addr >> 8); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR1), >>> + (adev->vce.gpu_addr >> 40) & 0xff); >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_40BIT_BAR2), >>> adev->vce.gpu_addr >> 8); >>> - } >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_VCPU_CACHE_64BIT_BAR2), >>> + (adev->vce.gpu_addr >> 40) & 0xff); >>> >>> offset = AMDGPU_VCE_FIRMWARE_OFFSET; >>> size = VCE_V4_0_FW_SIZE; >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET0), >>> - offset & 0x7FFFFFFF); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_SIZE0), size); >>> - >>> - offset += size; >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_OFFSET0), >>> + offset & ~0x0f000000); >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_SIZE0), size); >>> + >>> + offset = (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) ? >>> + offset + size : 0; >>> size = VCE_V4_0_STACK_SIZE; >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET1), >>> - offset & 0x7FFFFFFF); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_SIZE1), size); >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_OFFSET1), >>> + (offset & ~0x0f000000) | (1 << 24)); >> That mask still looks incorrect to me. >> >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_SIZE1), size); >>> >>> offset += size; >>> size = VCE_V4_0_DATA_SIZE; >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_OFFSET2), >>> - offset & 0x7FFFFFFF); >>> - MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CACHE_SIZE2), size); >>> - >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_LMI_CTRL2), ~0x100, 0); >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_SYS_INT_EN), >>> - VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK, >>> - VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK); >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_OFFSET2), >>> + (offset & ~0x0f000000) | (2 << 24)); >> Dito. >> >>> + MMSCH_V1_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CACHE_SIZE2), size); >>> + >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_LMI_CTRL2), ~0x100, 0); >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_SYS_INT_EN), >>> + VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK, >>> + VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK); >>> >>> /* end of MC_RESUME */ >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_STATUS), >>> - VCE_STATUS__JOB_BUSY_MASK, ~VCE_STATUS__JOB_BUSY_MASK); >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_VCPU_CNTL), >>> - ~0x200001, VCE_VCPU_CNTL__CLK_EN_MASK); >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_SOFT_RESET), >>> - ~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK, 0); >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_STATUS), >>> + VCE_STATUS__JOB_BUSY_MASK, >>> + ~VCE_STATUS__JOB_BUSY_MASK); >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_VCPU_CNTL), >>> + ~0x200001, >>> + VCE_VCPU_CNTL__CLK_EN_MASK); >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_SOFT_RESET), >>> + ~VCE_SOFT_RESET__ECPU_SOFT_RESET_MASK, 0); >> Unrelated coding style change, please concentrate on the functional change for this patch. >> >>> >>> MMSCH_V1_0_INSERT_DIRECT_POLL(SOC15_REG_OFFSET(VCE, 0, mmVCE_STATUS), >>> - VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK, >>> - VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK); >>> + VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK, >>> + VCE_STATUS_VCPU_REPORT_FW_LOADED_MASK); >> Here the indentation is wrong. Looks like it was correct before the change. >> >> Regards, >> Christian. >> >>> >>> /* clear BUSY flag */ >>> - MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, mmVCE_STATUS), >>> - ~VCE_STATUS__JOB_BUSY_MASK, 0); >>> + MMSCH_V1_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCE, 0, >>> + mmVCE_STATUS), >>> + ~VCE_STATUS__JOB_BUSY_MASK, 0); >>> >>> /* add end packet */ >>> memcpy((void *)init_table, &end, sizeof(struct >>> mmsch_v1_0_cmd_end));