On 10/9/2024 3:34 AM, Boyuan Zhang wrote: > > On 2024-10-08 03:03, Lazar, Lijo wrote: >> >> On 10/7/2024 8:54 PM, Alex Deucher wrote: >>> On Mon, Oct 7, 2024 at 10:32 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: >>>> >>>> >>>> On 10/7/2024 7:47 PM, Alex Deucher wrote: >>>>> On Mon, Oct 7, 2024 at 9:58 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: >>>>>> >>>>>> >>>>>> On 10/7/2024 7:03 PM, Boyuan Zhang wrote: >>>>>>> On 2024-10-07 01:22, Lazar, Lijo wrote: >>>>>>>> On 10/5/2024 12:14 AM, boyuan.zhang@xxxxxxx wrote: >>>>>>>>> From: Boyuan Zhang <boyuan.zhang@xxxxxxx> >>>>>>>>> >>>>>>>>> For vcn 2_5, add ip_block for each vcn instance during >>>>>>>>> discovery stage. >>>>>>>>> >>>>>>>>> And only powering on/off one of the vcn instance using the >>>>>>>>> instance value stored in ip_block, instead of powering on/off all >>>>>>>>> vcn instances. Modify the existing functions to use the >>>>>>>>> instance value >>>>>>>>> in ip_block, and remove the original for loop for all vcn >>>>>>>>> instances. >>>>>>>>> >>>>>>>>> v2: rename "i"/"j" to "inst" for instance value. >>>>>>>>> >>>>>>>>> Signed-off-by: Boyuan Zhang <boyuan.zhang@xxxxxxx> >>>>>>>>> --- >>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 +- >>>>>>>>> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 565 >>>>>>>>> +++++++++--------- >>>>>>>>> 2 files changed, 280 insertions(+), 289 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >>>>>>>>> index 9f9a1867da72..de1053cc2a2b 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c >>>>>>>>> @@ -2278,6 +2278,7 @@ static int >>>>>>>>> amdgpu_discovery_set_sdma_ip_blocks(struct amdgpu_device *adev) >>>>>>>>> static int amdgpu_discovery_set_mm_ip_blocks(struct >>>>>>>>> amdgpu_device >>>>>>>>> *adev) >>>>>>>>> { >>>>>>>>> + int i; >>>>>>>>> if (amdgpu_ip_version(adev, VCE_HWIP, 0)) { >>>>>>>>> switch (amdgpu_ip_version(adev, UVD_HWIP, 0)) { >>>>>>>>> case IP_VERSION(7, 0, 0): >>>>>>>>> @@ -2321,7 +2322,8 @@ static int >>>>>>>>> amdgpu_discovery_set_mm_ip_blocks(struct amdgpu_device *adev) >>>>>>>>> case IP_VERSION(2, 0, 3): >>>>>>>>> break; >>>>>>>>> case IP_VERSION(2, 5, 0): >>>>>>>>> - amdgpu_device_ip_block_add(adev, &vcn_v2_5_ip_block); >>>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) >>>>>>>>> + amdgpu_device_ip_block_add(adev, >>>>>>>>> &vcn_v2_5_ip_block); >>>>>>>> This introduces a totally confusing design now. At a higher >>>>>>>> level an IP >>>>>>>> block type manages multiple instances and their power states. >>>>>>>> Now there >>>>>>>> is a mix where no definition can be attributed to an IP block. >>>>>>>> Or, if >>>>>>>> this were to be done uniformly for other IPs, then for some SOCs >>>>>>>> there >>>>>>>> could be 16 SDMA blocks, 8 GFX blocks and so forth. >>>>>>>> >>>>>>>> What is the reason to do this way? Can't VCN IP block maintain >>>>>>>> the state >>>>>>>> of multiple instances within itself? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lijo >>>>>>> This is part of the fundamental design change for separating IP >>>>>>> block >>>>>>> per instance, in order to >>>>>>> handle each instance separately within same IP block (e.g. separate >>>>>>> power/clock management). >>>>>>> >>>>>>> Part 1, Change all adev ptr handle to amdgpu_ip_block ptr in all >>>>>>> callbacks (hw_init/fini, sw_init/fini, >>>>>>> suspend, etc...) for all IP blocks, so that each IP knows which >>>>>>> IP_block >>>>>>> (and which instance) the >>>>>>> callback is for. Most parts of this changes have been submitted >>>>>>> by Sunil. >>>>>>> >>>>>>> Part 2, Separate IP block per instance. >>>>>>> >>>>>>> Part 3, Since callbacks can pass in IP_block now after Part 1 >>>>>>> change and >>>>>>> instance value can be >>>>>>> obtained from each IP block in Part 2, IP can then choose to handle >>>>>>> callbacks ONLY for the given >>>>>>> instance of a given IP block (or still handling for all instances as >>>>>>> before). >>>>>>> For VCN, all callbacks will be handled only for the given one >>>>>>> instance, >>>>>>> instead of the original for- >>>>>>> loop for all instance. As a result, each VCN instance can be >>>>>>> start/stop/int/fini/suspend separately. >>>>>>> >>>>>>> Part 4, Change all VCN helper functions to handle only the given >>>>>>> instance, instead of the original >>>>>>> for-loop for all instance. >>>>>>> >>>>>>> Each instance can have its own states, so we think it makes more >>>>>>> sense >>>>>>> to separate them on IP >>>>>>> block level to handle each of them separately. >>>>>> >>>>>> Such a change should not be done unless all IPs follow the same >>>>>> design. >>>>>> You didn't answer the question - what necessitates this change? >>>>>> What is >>>>>> special about VCN that it cannot manage the states of multiple >>>>>> instances >>>>>> within the IP block? >>>>> We want to be able to manage the powergating independently for each >>>>> VCN instance for both power management and VCN reset. Right now power >>>>> gating is handled at the IP level so it's not easy or clean to handle >>>>> powergating of individual IP instances. >>>>> >>>> Still VCN block can manage the powergated instances (FWIW, it's just an >>>> array in SMU block). Also vcn block gets to run the idle worker and >>>> knows the rings (and corresponding VCN instances) that are idle. >>>> Maintaining instance states in VCN block and modifying idle worker to >>>> just idle that instance alone doesn't look like a complex change. >>> >>> We already went down that road and it's really ugly. We end up >>> duplicating a bunch of code paths for different driver flows because >>> set_powergating_state() and set_clockgating_state are at the IP level >>> and we want per instance gating. We could add a num_instances member >>> at the IP block level and then convert all of the high level code that >>> calls the IP functions loop over the number of instances, but that >>> seems like just as much work and it's not quite as clean IMHO. >>> >> Sorry, I'm not seeing much difference in this design. All it does is >> just split instances to IP block at the same time it's going to keep an >> instance parameter in IP block APIs. >> >> amdgpu_device_ip_set_clockgating_state(blocktype, instance); >> amdgpu_device_ip_set_powergating_state(blocktype, instance); >> >> Also, VCN continues to maintain an array of amdgpu_vcn_inst. I think >> it's easier manage this with changes to amdgpu_vcn_inst. Since it is >> continued to be maintained, what about just moving the states and >> idle_work inside amdgpu_vcn_inst. >> >> int inst; >> enum amd_powergating_state cur_pg_state; >> enum amd_clockgating_state state cur_cg_state; >> struct delayed_work idle_work; >> >> At the end of ring usage of corresponding instance, just invoke the >> corresponding instance's idle work. >> >> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work, >> idle_time_out); > First of all, separating idle work and state is still needed in current > design. > Separating idle work by instance is already handled in patch 17/18. > Separating power gating state is already handled in patch 10/18. > > I agree that "adding instance variable" in amdgpu_vcn_inst requires > much less effort, but as mentioned by Alex previously, we already > went down that way to use "inst" variable in amdgou_vcn_inst as you > listed above and track this instance value all the way from VCN to SMU. > However, this is a no-go based on discussions with Christian and Alex. > Since set_powergating_state() is at IP level, it will be clean to do > per instance gating at IP level. With the change of passing ip_block to > callback functions, all ip functions can now choose to handle only the > given instance, which is a clean separation between multiple instance. > I don't agree. There is no clean separation here. What I see here is a mix where ip block still remains a dummy. ipblock->inst => calls vcn.vcn_inst[i] that maintains the final state. If the design is based only on ip block, you wouldn't have required to maintain another vcn_inst. Everything would have been handled by maintaining state in ip block itself. > >> >> I don't see any change in this series for other IP block APIs. > Yes, currently we only do this per instance IP block for VCN. For long > term, > the plan is to change all other IP with multiple blocks into this design. > This path itself is not a viable solution. We will have cases where a group of ip blocks of the same type need to be handled together. At that point you will need another central controller for ip blocks belonging to a type. That will evenutally lead to another superblock being created. Thanks, Lijo > Regards, > Boyuan > > >> >> Thanks, >> Lijo >> >>> Alex >>> >>>> Moving to IP block per instance for VCN alone is not a change that >>>> helps >>>> to define an IP block. If that needs to be done for every other IP >>>> type, >>>> that's also a massive change. >>>> >>>> Also, then it's no longer possible to have something static like this - >>>> struct amdgpu_ip_block ip_blocks[AMDGPU_MAX_IP_NUM]; >>>> >>>> >>>> Thanks, >>>> Lijo >>>> >>>>> Alex >>>>> >>>>>> Thanks, >>>>>> Lijo >>>>>> >>>>>>> Thanks, >>>>>>> Boyuan >>>>>>>>> amdgpu_device_ip_block_add(adev, >>>>>>>>> &jpeg_v2_5_ip_block); >>>>>>>>> break; >>>>>>>>> case IP_VERSION(2, 6, 0): >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c >>>>>>>>> index d00df51bc400..1f8738ae360a 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c >>>>>>>>> @@ -158,35 +158,34 @@ static int vcn_v2_5_early_init(struct >>>>>>>>> amdgpu_ip_block *ip_block) >>>>>>>>> static int vcn_v2_5_sw_init(struct amdgpu_ip_block *ip_block) >>>>>>>>> { >>>>>>>>> struct amdgpu_ring *ring; >>>>>>>>> - int i, j, r; >>>>>>>>> + int i, r; >>>>>>>>> uint32_t reg_count = ARRAY_SIZE(vcn_reg_list_2_5); >>>>>>>>> uint32_t *ptr; >>>>>>>>> struct amdgpu_device *adev = ip_block->adev; >>>>>>>>> + int inst = ip_block->instance; >>>>>>>>> - for (j = 0; j < adev->vcn.num_vcn_inst; j++) { >>>>>>>>> - if (adev->vcn.harvest_config & (1 << j)) >>>>>>>>> - continue; >>>>>>>>> - /* VCN DEC TRAP */ >>>>>>>>> - r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[j], >>>>>>>>> - VCN_2_0__SRCID__UVD_SYSTEM_MESSAGE_INTERRUPT, >>>>>>>>> &adev->vcn.inst[j].irq); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> - >>>>>>>>> - /* VCN ENC TRAP */ >>>>>>>>> - for (i = 0; i < adev->vcn.num_enc_rings; ++i) { >>>>>>>>> - r = amdgpu_irq_add_id(adev, >>>>>>>>> amdgpu_ih_clientid_vcns[j], >>>>>>>>> - i + VCN_2_0__SRCID__UVD_ENC_GENERAL_PURPOSE, >>>>>>>>> &adev->vcn.inst[j].irq); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> - } >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + goto sw_init; >>>>>>>>> + /* VCN DEC TRAP */ >>>>>>>>> + r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[inst], >>>>>>>>> + VCN_2_0__SRCID__UVD_SYSTEM_MESSAGE_INTERRUPT, >>>>>>>>> &adev->vcn.inst[inst].irq); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> - /* VCN POISON TRAP */ >>>>>>>>> - r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[j], >>>>>>>>> - VCN_2_6__SRCID_UVD_POISON, >>>>>>>>> &adev->vcn.inst[j].ras_poison_irq); >>>>>>>>> + /* VCN ENC TRAP */ >>>>>>>>> + for (i = 0; i < adev->vcn.num_enc_rings; ++i) { >>>>>>>>> + r = amdgpu_irq_add_id(adev, >>>>>>>>> amdgpu_ih_clientid_vcns[inst], >>>>>>>>> + i + VCN_2_0__SRCID__UVD_ENC_GENERAL_PURPOSE, >>>>>>>>> &adev->vcn.inst[inst].irq); >>>>>>>>> if (r) >>>>>>>>> return r; >>>>>>>>> } >>>>>>>>> + /* VCN POISON TRAP */ >>>>>>>>> + r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[inst], >>>>>>>>> + VCN_2_6__SRCID_UVD_POISON, >>>>>>>>> &adev->vcn.inst[inst].ras_poison_irq); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> +sw_init: >>>>>>>>> r = amdgpu_vcn_sw_init(adev); >>>>>>>>> if (r) >>>>>>>>> return r; >>>>>>>>> @@ -197,76 +196,74 @@ static int vcn_v2_5_sw_init(struct >>>>>>>>> amdgpu_ip_block *ip_block) >>>>>>>>> if (r) >>>>>>>>> return r; >>>>>>>>> - for (j = 0; j < adev->vcn.num_vcn_inst; j++) { >>>>>>>>> - volatile struct amdgpu_fw_shared *fw_shared; >>>>>>>>> + volatile struct amdgpu_fw_shared *fw_shared; >>>>>>>>> - if (adev->vcn.harvest_config & (1 << j)) >>>>>>>>> - continue; >>>>>>>>> - adev->vcn.internal.context_id = >>>>>>>>> mmUVD_CONTEXT_ID_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.internal.ib_vmid = >>>>>>>>> mmUVD_LMI_RBC_IB_VMID_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.internal.ib_bar_low = >>>>>>>>> mmUVD_LMI_RBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.internal.ib_bar_high = >>>>>>>>> mmUVD_LMI_RBC_IB_64BIT_BAR_HIGH_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.internal.ib_size = >>>>>>>>> mmUVD_RBC_IB_SIZE_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.internal.gp_scratch8 = >>>>>>>>> mmUVD_GP_SCRATCH8_INTERNAL_OFFSET; >>>>>>>>> - >>>>>>>>> - adev->vcn.internal.scratch9 = >>>>>>>>> mmUVD_SCRATCH9_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.inst[j].external.scratch9 = >>>>>>>>> SOC15_REG_OFFSET(VCN, >>>>>>>>> j, mmUVD_SCRATCH9); >>>>>>>>> - adev->vcn.internal.data0 = >>>>>>>>> mmUVD_GPCOM_VCPU_DATA0_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.inst[j].external.data0 = >>>>>>>>> SOC15_REG_OFFSET(VCN, j, >>>>>>>>> mmUVD_GPCOM_VCPU_DATA0); >>>>>>>>> - adev->vcn.internal.data1 = >>>>>>>>> mmUVD_GPCOM_VCPU_DATA1_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.inst[j].external.data1 = >>>>>>>>> SOC15_REG_OFFSET(VCN, j, >>>>>>>>> mmUVD_GPCOM_VCPU_DATA1); >>>>>>>>> - adev->vcn.internal.cmd = >>>>>>>>> mmUVD_GPCOM_VCPU_CMD_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.inst[j].external.cmd = SOC15_REG_OFFSET(VCN, j, >>>>>>>>> mmUVD_GPCOM_VCPU_CMD); >>>>>>>>> - adev->vcn.internal.nop = mmUVD_NO_OP_INTERNAL_OFFSET; >>>>>>>>> - adev->vcn.inst[j].external.nop = SOC15_REG_OFFSET(VCN, j, >>>>>>>>> mmUVD_NO_OP); >>>>>>>>> - >>>>>>>>> - ring = &adev->vcn.inst[j].ring_dec; >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + goto done; >>>>>>>>> + adev->vcn.internal.context_id = >>>>>>>>> mmUVD_CONTEXT_ID_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.internal.ib_vmid = >>>>>>>>> mmUVD_LMI_RBC_IB_VMID_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.internal.ib_bar_low = >>>>>>>>> mmUVD_LMI_RBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.internal.ib_bar_high = >>>>>>>>> mmUVD_LMI_RBC_IB_64BIT_BAR_HIGH_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.internal.ib_size = >>>>>>>>> mmUVD_RBC_IB_SIZE_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.internal.gp_scratch8 = >>>>>>>>> mmUVD_GP_SCRATCH8_INTERNAL_OFFSET; >>>>>>>>> + >>>>>>>>> + adev->vcn.internal.scratch9 = mmUVD_SCRATCH9_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.inst[inst].external.scratch9 = >>>>>>>>> SOC15_REG_OFFSET(VCN, >>>>>>>>> inst, mmUVD_SCRATCH9); >>>>>>>>> + adev->vcn.internal.data0 = >>>>>>>>> mmUVD_GPCOM_VCPU_DATA0_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.inst[inst].external.data0 = SOC15_REG_OFFSET(VCN, >>>>>>>>> inst, mmUVD_GPCOM_VCPU_DATA0); >>>>>>>>> + adev->vcn.internal.data1 = >>>>>>>>> mmUVD_GPCOM_VCPU_DATA1_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.inst[inst].external.data1 = SOC15_REG_OFFSET(VCN, >>>>>>>>> inst, mmUVD_GPCOM_VCPU_DATA1); >>>>>>>>> + adev->vcn.internal.cmd = >>>>>>>>> mmUVD_GPCOM_VCPU_CMD_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.inst[inst].external.cmd = SOC15_REG_OFFSET(VCN, >>>>>>>>> inst, >>>>>>>>> mmUVD_GPCOM_VCPU_CMD); >>>>>>>>> + adev->vcn.internal.nop = mmUVD_NO_OP_INTERNAL_OFFSET; >>>>>>>>> + adev->vcn.inst[inst].external.nop = SOC15_REG_OFFSET(VCN, >>>>>>>>> inst, >>>>>>>>> mmUVD_NO_OP); >>>>>>>>> + >>>>>>>>> + ring = &adev->vcn.inst[inst].ring_dec; >>>>>>>>> + ring->use_doorbell = true; >>>>>>>>> + >>>>>>>>> + ring->doorbell_index = >>>>>>>>> (adev->doorbell_index.vcn.vcn_ring0_1 << >>>>>>>>> 1) + >>>>>>>>> + (amdgpu_sriov_vf(adev) ? 2*inst : 8*inst); >>>>>>>>> + >>>>>>>>> + if (amdgpu_ip_version(adev, UVD_HWIP, 0) == IP_VERSION(2, >>>>>>>>> 5, 0)) >>>>>>>>> + ring->vm_hub = AMDGPU_MMHUB1(0); >>>>>>>>> + else >>>>>>>>> + ring->vm_hub = AMDGPU_MMHUB0(0); >>>>>>>>> + >>>>>>>>> + sprintf(ring->name, "vcn_dec_%d", inst); >>>>>>>>> + r = amdgpu_ring_init(adev, ring, 512, >>>>>>>>> &adev->vcn.inst[inst].irq, >>>>>>>>> + 0, AMDGPU_RING_PRIO_DEFAULT, NULL); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> + >>>>>>>>> + for (i = 0; i < adev->vcn.num_enc_rings; ++i) { >>>>>>>>> + enum amdgpu_ring_priority_level hw_prio = >>>>>>>>> amdgpu_vcn_get_enc_ring_prio(i); >>>>>>>>> + >>>>>>>>> + ring = &adev->vcn.inst[inst].ring_enc[i]; >>>>>>>>> ring->use_doorbell = true; >>>>>>>>> ring->doorbell_index = >>>>>>>>> (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + >>>>>>>>> - (amdgpu_sriov_vf(adev) ? 2*j : 8*j); >>>>>>>>> + (amdgpu_sriov_vf(adev) ? (1 + i + 2*inst) : (2 >>>>>>>>> + i + >>>>>>>>> 8*inst)); >>>>>>>>> - if (amdgpu_ip_version(adev, UVD_HWIP, 0) == >>>>>>>>> IP_VERSION(2, >>>>>>>>> 5, 0)) >>>>>>>>> + if (amdgpu_ip_version(adev, UVD_HWIP, 0) == >>>>>>>>> + IP_VERSION(2, 5, 0)) >>>>>>>>> ring->vm_hub = AMDGPU_MMHUB1(0); >>>>>>>>> else >>>>>>>>> ring->vm_hub = AMDGPU_MMHUB0(0); >>>>>>>>> - sprintf(ring->name, "vcn_dec_%d", j); >>>>>>>>> - r = amdgpu_ring_init(adev, ring, 512, >>>>>>>>> &adev->vcn.inst[j].irq, >>>>>>>>> - 0, AMDGPU_RING_PRIO_DEFAULT, NULL); >>>>>>>>> + sprintf(ring->name, "vcn_enc_%d.%d", inst, i); >>>>>>>>> + r = amdgpu_ring_init(adev, ring, 512, >>>>>>>>> + &adev->vcn.inst[inst].irq, 0, >>>>>>>>> + hw_prio, NULL); >>>>>>>>> if (r) >>>>>>>>> return r; >>>>>>>>> - >>>>>>>>> - for (i = 0; i < adev->vcn.num_enc_rings; ++i) { >>>>>>>>> - enum amdgpu_ring_priority_level hw_prio = >>>>>>>>> amdgpu_vcn_get_enc_ring_prio(i); >>>>>>>>> - >>>>>>>>> - ring = &adev->vcn.inst[j].ring_enc[i]; >>>>>>>>> - ring->use_doorbell = true; >>>>>>>>> - >>>>>>>>> - ring->doorbell_index = >>>>>>>>> (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + >>>>>>>>> - (amdgpu_sriov_vf(adev) ? (1 + i + 2*j) : >>>>>>>>> (2 + i >>>>>>>>> + 8*j)); >>>>>>>>> - >>>>>>>>> - if (amdgpu_ip_version(adev, UVD_HWIP, 0) == >>>>>>>>> - IP_VERSION(2, 5, 0)) >>>>>>>>> - ring->vm_hub = AMDGPU_MMHUB1(0); >>>>>>>>> - else >>>>>>>>> - ring->vm_hub = AMDGPU_MMHUB0(0); >>>>>>>>> - >>>>>>>>> - sprintf(ring->name, "vcn_enc_%d.%d", j, i); >>>>>>>>> - r = amdgpu_ring_init(adev, ring, 512, >>>>>>>>> - &adev->vcn.inst[j].irq, 0, >>>>>>>>> - hw_prio, NULL); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - fw_shared = adev->vcn.inst[j].fw_shared.cpu_addr; >>>>>>>>> - fw_shared->present_flag_0 = >>>>>>>>> cpu_to_le32(AMDGPU_VCN_MULTI_QUEUE_FLAG); >>>>>>>>> - >>>>>>>>> - if (amdgpu_vcnfw_log) >>>>>>>>> - amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]); >>>>>>>>> } >>>>>>>>> + fw_shared = adev->vcn.inst[inst].fw_shared.cpu_addr; >>>>>>>>> + fw_shared->present_flag_0 = >>>>>>>>> cpu_to_le32(AMDGPU_VCN_MULTI_QUEUE_FLAG); >>>>>>>>> + >>>>>>>>> + if (amdgpu_vcnfw_log) >>>>>>>>> + amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]); >>>>>>>>> +done: >>>>>>>>> if (amdgpu_sriov_vf(adev)) { >>>>>>>>> r = amdgpu_virt_alloc_mm_table(adev); >>>>>>>>> if (r) >>>>>>>>> @@ -1005,197 +1002,192 @@ static int >>>>>>>>> vcn_v2_5_start_dpg_mode(struct >>>>>>>>> amdgpu_device *adev, int inst_idx, boo >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> -static int vcn_v2_5_start(struct amdgpu_device *adev) >>>>>>>>> +static int vcn_v2_5_start(struct amdgpu_device *adev, unsigned >>>>>>>>> int >>>>>>>>> inst) >>>>>>>>> { >>>>>>>>> struct amdgpu_ring *ring; >>>>>>>>> uint32_t rb_bufsz, tmp; >>>>>>>>> - int i, j, k, r; >>>>>>>>> + int j, k, r; >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - if (adev->pm.dpm_enabled) >>>>>>>>> - amdgpu_dpm_enable_vcn(adev, true, i); >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) >>>>>>>>> - continue; >>>>>>>>> - if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { >>>>>>>>> - r = vcn_v2_5_start_dpg_mode(adev, i, >>>>>>>>> adev->vcn.indirect_sram); >>>>>>>>> - continue; >>>>>>>>> - } >>>>>>>>> + if (adev->pm.dpm_enabled) >>>>>>>>> + amdgpu_dpm_enable_vcn(adev, true, inst); >>>>>>>>> - /* disable register anti-hang mechanism */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_POWER_STATUS), 0, >>>>>>>>> - ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + return 0; >>>>>>>>> - /* set uvd status busy */ >>>>>>>>> - tmp = RREG32_SOC15(VCN, i, mmUVD_STATUS) | >>>>>>>>> UVD_STATUS__UVD_BUSY; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_STATUS, tmp); >>>>>>>>> + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { >>>>>>>>> + r = vcn_v2_5_start_dpg_mode(adev, inst, >>>>>>>>> adev->vcn.indirect_sram); >>>>>>>>> + return r; >>>>>>>>> } >>>>>>>>> + /* disable register anti-hang mechanism */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_POWER_STATUS), 0, >>>>>>>>> + ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); >>>>>>>>> + >>>>>>>>> + /* set uvd status busy */ >>>>>>>>> + tmp = RREG32_SOC15(VCN, inst, mmUVD_STATUS) | >>>>>>>>> UVD_STATUS__UVD_BUSY; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_STATUS, tmp); >>>>>>>>> + >>>>>>>>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) >>>>>>>>> return 0; >>>>>>>>> /*SW clock gating */ >>>>>>>>> vcn_v2_5_disable_clock_gating(adev); >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) >>>>>>>>> - continue; >>>>>>>>> - /* enable VCPU clock */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_VCPU_CNTL), >>>>>>>>> - UVD_VCPU_CNTL__CLK_EN_MASK, >>>>>>>>> ~UVD_VCPU_CNTL__CLK_EN_MASK); >>>>>>>>> - >>>>>>>>> - /* disable master interrupt */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_MASTINT_EN), 0, >>>>>>>>> - ~UVD_MASTINT_EN__VCPU_EN_MASK); >>>>>>>>> - >>>>>>>>> - /* setup mmUVD_LMI_CTRL */ >>>>>>>>> - tmp = RREG32_SOC15(VCN, i, mmUVD_LMI_CTRL); >>>>>>>>> - tmp &= ~0xff; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_LMI_CTRL, tmp | 0x8| >>>>>>>>> - UVD_LMI_CTRL__WRITE_CLEAN_TIMER_EN_MASK | >>>>>>>>> - UVD_LMI_CTRL__MASK_MC_URGENT_MASK | >>>>>>>>> - UVD_LMI_CTRL__DATA_COHERENCY_EN_MASK | >>>>>>>>> - UVD_LMI_CTRL__VCPU_DATA_COHERENCY_EN_MASK); >>>>>>>>> - >>>>>>>>> - /* setup mmUVD_MPC_CNTL */ >>>>>>>>> - tmp = RREG32_SOC15(VCN, i, mmUVD_MPC_CNTL); >>>>>>>>> - tmp &= ~UVD_MPC_CNTL__REPLACEMENT_MODE_MASK; >>>>>>>>> - tmp |= 0x2 << UVD_MPC_CNTL__REPLACEMENT_MODE__SHIFT; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_MPC_CNTL, tmp); >>>>>>>>> - >>>>>>>>> - /* setup UVD_MPC_SET_MUXA0 */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_MPC_SET_MUXA0, >>>>>>>>> - ((0x1 << UVD_MPC_SET_MUXA0__VARA_1__SHIFT) | >>>>>>>>> - (0x2 << UVD_MPC_SET_MUXA0__VARA_2__SHIFT) | >>>>>>>>> - (0x3 << UVD_MPC_SET_MUXA0__VARA_3__SHIFT) | >>>>>>>>> - (0x4 << UVD_MPC_SET_MUXA0__VARA_4__SHIFT))); >>>>>>>>> - >>>>>>>>> - /* setup UVD_MPC_SET_MUXB0 */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_MPC_SET_MUXB0, >>>>>>>>> - ((0x1 << UVD_MPC_SET_MUXB0__VARB_1__SHIFT) | >>>>>>>>> - (0x2 << UVD_MPC_SET_MUXB0__VARB_2__SHIFT) | >>>>>>>>> - (0x3 << UVD_MPC_SET_MUXB0__VARB_3__SHIFT) | >>>>>>>>> - (0x4 << UVD_MPC_SET_MUXB0__VARB_4__SHIFT))); >>>>>>>>> - >>>>>>>>> - /* setup mmUVD_MPC_SET_MUX */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_MPC_SET_MUX, >>>>>>>>> - ((0x0 << UVD_MPC_SET_MUX__SET_0__SHIFT) | >>>>>>>>> - (0x1 << UVD_MPC_SET_MUX__SET_1__SHIFT) | >>>>>>>>> - (0x2 << UVD_MPC_SET_MUX__SET_2__SHIFT))); >>>>>>>>> - } >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + /* enable VCPU clock */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), >>>>>>>>> + UVD_VCPU_CNTL__CLK_EN_MASK, ~UVD_VCPU_CNTL__CLK_EN_MASK); >>>>>>>>> + >>>>>>>>> + /* disable master interrupt */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_MASTINT_EN), 0, >>>>>>>>> + ~UVD_MASTINT_EN__VCPU_EN_MASK); >>>>>>>>> + >>>>>>>>> + /* setup mmUVD_LMI_CTRL */ >>>>>>>>> + tmp = RREG32_SOC15(VCN, inst, mmUVD_LMI_CTRL); >>>>>>>>> + tmp &= ~0xff; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_LMI_CTRL, tmp | 0x8| >>>>>>>>> + UVD_LMI_CTRL__WRITE_CLEAN_TIMER_EN_MASK | >>>>>>>>> + UVD_LMI_CTRL__MASK_MC_URGENT_MASK | >>>>>>>>> + UVD_LMI_CTRL__DATA_COHERENCY_EN_MASK | >>>>>>>>> + UVD_LMI_CTRL__VCPU_DATA_COHERENCY_EN_MASK); >>>>>>>>> + >>>>>>>>> + /* setup mmUVD_MPC_CNTL */ >>>>>>>>> + tmp = RREG32_SOC15(VCN, inst, mmUVD_MPC_CNTL); >>>>>>>>> + tmp &= ~UVD_MPC_CNTL__REPLACEMENT_MODE_MASK; >>>>>>>>> + tmp |= 0x2 << UVD_MPC_CNTL__REPLACEMENT_MODE__SHIFT; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_MPC_CNTL, tmp); >>>>>>>>> + >>>>>>>>> + /* setup UVD_MPC_SET_MUXA0 */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_MPC_SET_MUXA0, >>>>>>>>> + ((0x1 << UVD_MPC_SET_MUXA0__VARA_1__SHIFT) | >>>>>>>>> + (0x2 << UVD_MPC_SET_MUXA0__VARA_2__SHIFT) | >>>>>>>>> + (0x3 << UVD_MPC_SET_MUXA0__VARA_3__SHIFT) | >>>>>>>>> + (0x4 << UVD_MPC_SET_MUXA0__VARA_4__SHIFT))); >>>>>>>>> + >>>>>>>>> + /* setup UVD_MPC_SET_MUXB0 */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_MPC_SET_MUXB0, >>>>>>>>> + ((0x1 << UVD_MPC_SET_MUXB0__VARB_1__SHIFT) | >>>>>>>>> + (0x2 << UVD_MPC_SET_MUXB0__VARB_2__SHIFT) | >>>>>>>>> + (0x3 << UVD_MPC_SET_MUXB0__VARB_3__SHIFT) | >>>>>>>>> + (0x4 << UVD_MPC_SET_MUXB0__VARB_4__SHIFT))); >>>>>>>>> + >>>>>>>>> + /* setup mmUVD_MPC_SET_MUX */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_MPC_SET_MUX, >>>>>>>>> + ((0x0 << UVD_MPC_SET_MUX__SET_0__SHIFT) | >>>>>>>>> + (0x1 << UVD_MPC_SET_MUX__SET_1__SHIFT) | >>>>>>>>> + (0x2 << UVD_MPC_SET_MUX__SET_2__SHIFT))); >>>>>>>>> vcn_v2_5_mc_resume(adev); >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - volatile struct amdgpu_fw_shared *fw_shared = >>>>>>>>> adev->vcn.inst[i].fw_shared.cpu_addr; >>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) >>>>>>>>> - continue; >>>>>>>>> - /* VCN global tiling registers */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_GFX8_ADDR_CONFIG, >>>>>>>>> - adev->gfx.config.gb_addr_config); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_GFX8_ADDR_CONFIG, >>>>>>>>> - adev->gfx.config.gb_addr_config); >>>>>>>>> + volatile struct amdgpu_fw_shared *fw_shared = >>>>>>>>> adev->vcn.inst[inst].fw_shared.cpu_addr; >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + /* VCN global tiling registers */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_GFX8_ADDR_CONFIG, >>>>>>>>> + adev->gfx.config.gb_addr_config); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_GFX8_ADDR_CONFIG, >>>>>>>>> + adev->gfx.config.gb_addr_config); >>>>>>>>> - /* enable LMI MC and UMC channels */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_LMI_CTRL2), 0, >>>>>>>>> - ~UVD_LMI_CTRL2__STALL_ARB_UMC_MASK); >>>>>>>>> + /* enable LMI MC and UMC channels */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_LMI_CTRL2), 0, >>>>>>>>> + ~UVD_LMI_CTRL2__STALL_ARB_UMC_MASK); >>>>>>>>> - /* unblock VCPU register access */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_RB_ARB_CTRL), 0, >>>>>>>>> - ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); >>>>>>>>> + /* unblock VCPU register access */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_RB_ARB_CTRL), 0, >>>>>>>>> + ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_VCPU_CNTL), 0, >>>>>>>>> - ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), 0, >>>>>>>>> + ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> - for (k = 0; k < 10; ++k) { >>>>>>>>> - uint32_t status; >>>>>>>>> - >>>>>>>>> - for (j = 0; j < 100; ++j) { >>>>>>>>> - status = RREG32_SOC15(VCN, i, mmUVD_STATUS); >>>>>>>>> - if (status & 2) >>>>>>>>> - break; >>>>>>>>> - if (amdgpu_emu_mode == 1) >>>>>>>>> - msleep(500); >>>>>>>>> - else >>>>>>>>> - mdelay(10); >>>>>>>>> - } >>>>>>>>> - r = 0; >>>>>>>>> + for (k = 0; k < 10; ++k) { >>>>>>>>> + uint32_t status; >>>>>>>>> + >>>>>>>>> + for (j = 0; j < 100; ++j) { >>>>>>>>> + status = RREG32_SOC15(VCN, inst, mmUVD_STATUS); >>>>>>>>> if (status & 2) >>>>>>>>> break; >>>>>>>>> + if (amdgpu_emu_mode == 1) >>>>>>>>> + msleep(500); >>>>>>>>> + else >>>>>>>>> + mdelay(10); >>>>>>>>> + } >>>>>>>>> + r = 0; >>>>>>>>> + if (status & 2) >>>>>>>>> + break; >>>>>>>>> - DRM_ERROR("VCN decode not responding, trying to >>>>>>>>> reset >>>>>>>>> the VCPU!!!\n"); >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_VCPU_CNTL), >>>>>>>>> - UVD_VCPU_CNTL__BLK_RST_MASK, >>>>>>>>> - ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> - mdelay(10); >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, >>>>>>>>> mmUVD_VCPU_CNTL), 0, >>>>>>>>> - ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> + DRM_ERROR("VCN decode not responding, trying to reset the >>>>>>>>> VCPU!!!\n"); >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), >>>>>>>>> + UVD_VCPU_CNTL__BLK_RST_MASK, >>>>>>>>> + ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> + mdelay(10); >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), 0, >>>>>>>>> + ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> - mdelay(10); >>>>>>>>> - r = -1; >>>>>>>>> - } >>>>>>>>> + mdelay(10); >>>>>>>>> + r = -1; >>>>>>>>> + } >>>>>>>>> - if (r) { >>>>>>>>> - DRM_ERROR("VCN decode not responding, giving >>>>>>>>> up!!!\n"); >>>>>>>>> - return r; >>>>>>>>> - } >>>>>>>>> + if (r) { >>>>>>>>> + DRM_ERROR("VCN decode not responding, giving up!!!\n"); >>>>>>>>> + return r; >>>>>>>>> + } >>>>>>>>> - /* enable master interrupt */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_MASTINT_EN), >>>>>>>>> - UVD_MASTINT_EN__VCPU_EN_MASK, >>>>>>>>> - ~UVD_MASTINT_EN__VCPU_EN_MASK); >>>>>>>>> + /* enable master interrupt */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_MASTINT_EN), >>>>>>>>> + UVD_MASTINT_EN__VCPU_EN_MASK, >>>>>>>>> + ~UVD_MASTINT_EN__VCPU_EN_MASK); >>>>>>>>> - /* clear the busy bit of VCN_STATUS */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_STATUS), 0, >>>>>>>>> - ~(2 << UVD_STATUS__VCPU_REPORT__SHIFT)); >>>>>>>>> + /* clear the busy bit of VCN_STATUS */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_STATUS), 0, >>>>>>>>> + ~(2 << UVD_STATUS__VCPU_REPORT__SHIFT)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_LMI_RBC_RB_VMID, 0); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_LMI_RBC_RB_VMID, 0); >>>>>>>>> - ring = &adev->vcn.inst[i].ring_dec; >>>>>>>>> - /* force RBC into idle state */ >>>>>>>>> - rb_bufsz = order_base_2(ring->ring_size); >>>>>>>>> - tmp = REG_SET_FIELD(0, UVD_RBC_RB_CNTL, RB_BUFSZ, >>>>>>>>> rb_bufsz); >>>>>>>>> - tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_BLKSZ, 1); >>>>>>>>> - tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_NO_FETCH, >>>>>>>>> 1); >>>>>>>>> - tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, >>>>>>>>> RB_NO_UPDATE, 1); >>>>>>>>> - tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, >>>>>>>>> RB_RPTR_WR_EN, 1); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RBC_RB_CNTL, tmp); >>>>>>>>> + ring = &adev->vcn.inst[inst].ring_dec; >>>>>>>>> + /* force RBC into idle state */ >>>>>>>>> + rb_bufsz = order_base_2(ring->ring_size); >>>>>>>>> + tmp = REG_SET_FIELD(0, UVD_RBC_RB_CNTL, RB_BUFSZ, rb_bufsz); >>>>>>>>> + tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_BLKSZ, 1); >>>>>>>>> + tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_NO_FETCH, 1); >>>>>>>>> + tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_NO_UPDATE, 1); >>>>>>>>> + tmp = REG_SET_FIELD(tmp, UVD_RBC_RB_CNTL, RB_RPTR_WR_EN, 1); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RBC_RB_CNTL, tmp); >>>>>>>>> - fw_shared->multi_queue.decode_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> - /* program the RB_BASE for ring buffer */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_LMI_RBC_RB_64BIT_BAR_LOW, >>>>>>>>> - lower_32_bits(ring->gpu_addr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_LMI_RBC_RB_64BIT_BAR_HIGH, >>>>>>>>> - upper_32_bits(ring->gpu_addr)); >>>>>>>>> + fw_shared->multi_queue.decode_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> + /* program the RB_BASE for ring buffer */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_LMI_RBC_RB_64BIT_BAR_LOW, >>>>>>>>> + lower_32_bits(ring->gpu_addr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_LMI_RBC_RB_64BIT_BAR_HIGH, >>>>>>>>> + upper_32_bits(ring->gpu_addr)); >>>>>>>>> - /* Initialize the ring buffer's read and write >>>>>>>>> pointers */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RBC_RB_RPTR, 0); >>>>>>>>> + /* Initialize the ring buffer's read and write pointers */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RBC_RB_RPTR, 0); >>>>>>>>> - ring->wptr = RREG32_SOC15(VCN, i, mmUVD_RBC_RB_RPTR); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RBC_RB_WPTR, >>>>>>>>> - lower_32_bits(ring->wptr)); >>>>>>>>> - fw_shared->multi_queue.decode_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> + ring->wptr = RREG32_SOC15(VCN, inst, mmUVD_RBC_RB_RPTR); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RBC_RB_WPTR, >>>>>>>>> + lower_32_bits(ring->wptr)); >>>>>>>>> + fw_shared->multi_queue.decode_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> - >>>>>>>>> fw_shared->multi_queue.encode_generalpurpose_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> - ring = &adev->vcn.inst[i].ring_enc[0]; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_RPTR, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_WPTR, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_BASE_LO, ring->gpu_addr); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_BASE_HI, >>>>>>>>> upper_32_bits(ring->gpu_addr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_SIZE, ring->ring_size / 4); >>>>>>>>> - >>>>>>>>> fw_shared->multi_queue.encode_generalpurpose_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> - >>>>>>>>> - fw_shared->multi_queue.encode_lowlatency_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> - ring = &adev->vcn.inst[i].ring_enc[1]; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_RPTR2, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_WPTR2, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_BASE_LO2, ring->gpu_addr); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_BASE_HI2, >>>>>>>>> upper_32_bits(ring->gpu_addr)); >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_RB_SIZE2, ring->ring_size / >>>>>>>>> 4); >>>>>>>>> - fw_shared->multi_queue.encode_lowlatency_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> - } >>>>>>>>> + fw_shared->multi_queue.encode_generalpurpose_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> + ring = &adev->vcn.inst[inst].ring_enc[0]; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_RPTR, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_WPTR, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_BASE_LO, ring->gpu_addr); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_BASE_HI, >>>>>>>>> upper_32_bits(ring->gpu_addr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_SIZE, ring->ring_size / 4); >>>>>>>>> + fw_shared->multi_queue.encode_generalpurpose_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> + >>>>>>>>> + fw_shared->multi_queue.encode_lowlatency_queue_mode |= >>>>>>>>> FW_QUEUE_RING_RESET; >>>>>>>>> + ring = &adev->vcn.inst[inst].ring_enc[1]; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_RPTR2, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_WPTR2, >>>>>>>>> lower_32_bits(ring->wptr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_BASE_LO2, ring->gpu_addr); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_BASE_HI2, >>>>>>>>> upper_32_bits(ring->gpu_addr)); >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_RB_SIZE2, ring->ring_size / 4); >>>>>>>>> + fw_shared->multi_queue.encode_lowlatency_queue_mode &= >>>>>>>>> ~FW_QUEUE_RING_RESET; >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> @@ -1424,72 +1416,69 @@ static int vcn_v2_5_stop_dpg_mode(struct >>>>>>>>> amdgpu_device *adev, int inst_idx) >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> -static int vcn_v2_5_stop(struct amdgpu_device *adev) >>>>>>>>> +static int vcn_v2_5_stop(struct amdgpu_device *adev, unsigned >>>>>>>>> int inst) >>>>>>>>> { >>>>>>>>> uint32_t tmp; >>>>>>>>> - int i, r = 0; >>>>>>>>> + int r = 0; >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) >>>>>>>>> - continue; >>>>>>>>> - if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { >>>>>>>>> - r = vcn_v2_5_stop_dpg_mode(adev, i); >>>>>>>>> - continue; >>>>>>>>> - } >>>>>>>>> + if (adev->vcn.harvest_config & (1 << inst)) >>>>>>>>> + goto done; >>>>>>>>> - /* wait for vcn idle */ >>>>>>>>> - r = SOC15_WAIT_ON_RREG(VCN, i, mmUVD_STATUS, >>>>>>>>> UVD_STATUS__IDLE, 0x7); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { >>>>>>>>> + r = vcn_v2_5_stop_dpg_mode(adev, inst); >>>>>>>>> + goto done; >>>>>>>>> + } >>>>>>>>> - tmp = UVD_LMI_STATUS__VCPU_LMI_WRITE_CLEAN_MASK | >>>>>>>>> - UVD_LMI_STATUS__READ_CLEAN_MASK | >>>>>>>>> - UVD_LMI_STATUS__WRITE_CLEAN_MASK | >>>>>>>>> - UVD_LMI_STATUS__WRITE_CLEAN_RAW_MASK; >>>>>>>>> - r = SOC15_WAIT_ON_RREG(VCN, i, mmUVD_LMI_STATUS, tmp, >>>>>>>>> tmp); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> + /* wait for vcn idle */ >>>>>>>>> + r = SOC15_WAIT_ON_RREG(VCN, inst, mmUVD_STATUS, >>>>>>>>> UVD_STATUS__IDLE, 0x7); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> - /* block LMI UMC channel */ >>>>>>>>> - tmp = RREG32_SOC15(VCN, i, mmUVD_LMI_CTRL2); >>>>>>>>> - tmp |= UVD_LMI_CTRL2__STALL_ARB_UMC_MASK; >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_LMI_CTRL2, tmp); >>>>>>>>> + tmp = UVD_LMI_STATUS__VCPU_LMI_WRITE_CLEAN_MASK | >>>>>>>>> + UVD_LMI_STATUS__READ_CLEAN_MASK | >>>>>>>>> + UVD_LMI_STATUS__WRITE_CLEAN_MASK | >>>>>>>>> + UVD_LMI_STATUS__WRITE_CLEAN_RAW_MASK; >>>>>>>>> + r = SOC15_WAIT_ON_RREG(VCN, inst, mmUVD_LMI_STATUS, tmp, >>>>>>>>> tmp); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> - tmp = UVD_LMI_STATUS__UMC_READ_CLEAN_RAW_MASK| >>>>>>>>> - UVD_LMI_STATUS__UMC_WRITE_CLEAN_RAW_MASK; >>>>>>>>> - r = SOC15_WAIT_ON_RREG(VCN, i, mmUVD_LMI_STATUS, tmp, >>>>>>>>> tmp); >>>>>>>>> - if (r) >>>>>>>>> - return r; >>>>>>>>> + /* block LMI UMC channel */ >>>>>>>>> + tmp = RREG32_SOC15(VCN, inst, mmUVD_LMI_CTRL2); >>>>>>>>> + tmp |= UVD_LMI_CTRL2__STALL_ARB_UMC_MASK; >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_LMI_CTRL2, tmp); >>>>>>>>> - /* block VCPU register access */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_RB_ARB_CTRL), >>>>>>>>> - UVD_RB_ARB_CTRL__VCPU_DIS_MASK, >>>>>>>>> - ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); >>>>>>>>> + tmp = UVD_LMI_STATUS__UMC_READ_CLEAN_RAW_MASK| >>>>>>>>> + UVD_LMI_STATUS__UMC_WRITE_CLEAN_RAW_MASK; >>>>>>>>> + r = SOC15_WAIT_ON_RREG(VCN, inst, mmUVD_LMI_STATUS, tmp, >>>>>>>>> tmp); >>>>>>>>> + if (r) >>>>>>>>> + return r; >>>>>>>>> - /* reset VCPU */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_VCPU_CNTL), >>>>>>>>> - UVD_VCPU_CNTL__BLK_RST_MASK, >>>>>>>>> - ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> + /* block VCPU register access */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_RB_ARB_CTRL), >>>>>>>>> + UVD_RB_ARB_CTRL__VCPU_DIS_MASK, >>>>>>>>> + ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); >>>>>>>>> - /* disable VCPU clock */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_VCPU_CNTL), 0, >>>>>>>>> - ~(UVD_VCPU_CNTL__CLK_EN_MASK)); >>>>>>>>> + /* reset VCPU */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), >>>>>>>>> + UVD_VCPU_CNTL__BLK_RST_MASK, >>>>>>>>> + ~UVD_VCPU_CNTL__BLK_RST_MASK); >>>>>>>>> - /* clear status */ >>>>>>>>> - WREG32_SOC15(VCN, i, mmUVD_STATUS, 0); >>>>>>>>> + /* disable VCPU clock */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_VCPU_CNTL), 0, >>>>>>>>> + ~(UVD_VCPU_CNTL__CLK_EN_MASK)); >>>>>>>>> - vcn_v2_5_enable_clock_gating(adev); >>>>>>>>> + /* clear status */ >>>>>>>>> + WREG32_SOC15(VCN, inst, mmUVD_STATUS, 0); >>>>>>>>> - /* enable register anti-hang mechanism */ >>>>>>>>> - WREG32_P(SOC15_REG_OFFSET(VCN, i, mmUVD_POWER_STATUS), >>>>>>>>> - UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, >>>>>>>>> - ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); >>>>>>>>> - } >>>>>>>>> + vcn_v2_5_enable_clock_gating(adev); >>>>>>>>> - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { >>>>>>>>> - if (adev->pm.dpm_enabled) >>>>>>>>> - amdgpu_dpm_enable_vcn(adev, false, i); >>>>>>>>> - } >>>>>>>>> + /* enable register anti-hang mechanism */ >>>>>>>>> + WREG32_P(SOC15_REG_OFFSET(VCN, inst, mmUVD_POWER_STATUS), >>>>>>>>> + UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, >>>>>>>>> + ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); >>>>>>>>> +done: >>>>>>>>> + if (adev->pm.dpm_enabled) >>>>>>>>> + amdgpu_dpm_enable_vcn(adev, false, inst); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> @@ -1838,9 +1827,9 @@ static int >>>>>>>>> vcn_v2_5_set_powergating_state(struct amdgpu_ip_block *ip_block, >>>>>>>>> return 0; >>>>>>>>> if (state == AMD_PG_STATE_GATE) >>>>>>>>> - ret = vcn_v2_5_stop(adev); >>>>>>>>> + ret = vcn_v2_5_stop(adev, inst); >>>>>>>>> else >>>>>>>>> - ret = vcn_v2_5_start(adev); >>>>>>>>> + ret = vcn_v2_5_start(adev, inst); >>>>>>>>> if (!ret) >>>>>>>>> adev->vcn.cur_state[inst] = state;