On Mon, Sep 5, 2022 at 1:27 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > > > > On 9/2/2022 1:09 AM, Alex Deucher wrote: > > How about this? We should just move HDP remap to gmc hw_init since > > that is mainly what uses it anyway. > > > > Sorry, I missed to R-B the previous version. Did you find any problem > when common block is initialized first? > One of the users on the bug report reported an issue with that patch, that said, that user seems to be seeing a slightly different issue since he is on a gfx9 card and the original reporter was on gfx10. https://bugzilla.kernel.org/show_bug.cgi?id=216373 Alex > I think that patch provides a consistent IP init sequence during cold > init and resume. > > Thanks, > Lijo > > > Alex > > > > On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> > >> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > >>> > >>> > >>> > >>> On 8/30/2022 8:39 PM, Alex Deucher wrote: > >>>> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 8/30/2022 7:18 PM, Alex Deucher wrote: > >>>>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote: > >>>>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@xxxxxxx> wrote: > >>>>>>>>> > >>>>>>>>> HDP flush is used early in the init sequence as part of memory controller > >>>>>>>>> block initialization. Hence remapping of HDP registers needed for flush > >>>>>>>>> needs to happen earlier. > >>>>>>>>> > >>>>>>>>> This also fixes the Unsupported Request error reported through AER during > >>>>>>>>> driver load. The error happens as a write happens to the remap offset > >>>>>>>>> before real remapping is done. > >>>>>>>>> > >>>>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&data=05%7C01%7Clijo.lazar%40amd.com%7C4e59ae0f8ed54aa7c5a208da8c51aa29%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637976579623485975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WTcd9ImY7Oba0MT6oQ7%2B7UEBkdS6azvqgYoK%2B%2F4mJPg%3D&reserved=0 > >>>>>>>>> > >>>>>>>>> The error was unnoticed before and got visible because of the commit > >>>>>>>>> referenced below. This doesn't fix anything in the commit below, rather > >>>>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only > >>>>>>>>> to associate this commit with below one so that both go together. > >>>>>>>>> > >>>>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") > >>>>>>>>> > >>>>>>>>> Reported-by: Tom Seewald <tseewald@xxxxxxxxx> > >>>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > >>>>>>>>> Cc: stable@xxxxxxxxxxxxxxx > >>>>>>>> > >>>>>>>> How about something like the attached patch rather than these two > >>>>>>>> patches? It's a bit bigger but seems cleaner and more defensive in my > >>>>>>>> opinion. > >>>>>>>> > >>>>>>> > >>>>>>> Whenever device goes to suspend/reset and then comes back, remap offset > >>>>>>> has to be set back to 0 to make sure it doesn't use the wrong offset > >>>>>>> when the register assumes default values again. > >>>>>>> > >>>>>>> To avoid the if-check in hdp_flush (which is more frequent), another way > >>>>>>> is to initialize the remap offset to default offset during early init > >>>>>>> and hw fini/suspend sequences. It won't be obvious (even with this > >>>>>>> patch) as to when remap offset vs default offset is used though. > >>>>>> > >>>>>> On resume, the common IP is resumed first so it will always be set. > >>>>>> The only case that is a problem is init because we init GMC out of > >>>>>> order. We could init common before GMC in amdgpu_device_ip_init(). I > >>>>>> think that should be fine, but I wasn't sure if there might be some > >>>>>> fallout from that on certain cards. > >>>>>> > >>>>> > >>>>> There are other places where an IP order is forced like in > >>>>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this > >>>>> case as remapping is not done for VF. > >>>>> > >>>>> Agree that a better way is to have the common IP to be inited first. > >>>> > >>>> How about these patches? > >>>> > >>> > >>> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush > >>> is not expected to be used)? > >> > >> It would be used in some cases, e.g., GPU VM passthrough where we use > >> the BAR rather than the carve out. > >> > >> Alex > >> > >> > >>> > >>> Thanks, > >>> Lijo > >>> > >>>> Alex > >>>> > >>>> > >>>>> > >>>>> Thanks, > >>>>> Lijo > >>>>> > >>>>>> Alex > >>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Lijo > >>>>>>> > >>>>>>>> Alex > >>>>>>>> > >>>>>>>>> --- > >>>>>>>>> v2: > >>>>>>>>> Take care of IP resume cases (Alex Deucher) > >>>>>>>>> Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling) > >>>>>>>>> Add more details in commit message and associate with AER patch (Bjorn > >>>>>>>>> Helgaas) > >>>>>>>>> > >>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++ > >>>>>>>>> drivers/gpu/drm/amd/amdgpu/nv.c | 6 ------ > >>>>>>>>> drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ------ > >>>>>>>>> drivers/gpu/drm/amd/amdgpu/soc21.c | 6 ------ > >>>>>>>>> 4 files changed, 24 insertions(+), 18 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>>>>> index ce7d117efdb5..e420118769a5 100644 > >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +/** > >>>>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization > >>>>>>>>> + * > >>>>>>>>> + * @adev: amdgpu_device pointer > >>>>>>>>> + * > >>>>>>>>> + * Any common hardware initialization sequence that needs to be done before > >>>>>>>>> + * hw init of individual IPs is performed here. This is different from the > >>>>>>>>> + * 'common block' which initializes a set of IPs. > >>>>>>>>> + */ > >>>>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev) > >>>>>>>>> +{ > >>>>>>>>> + /* Remap HDP registers to a hole in mmio space, for the purpose > >>>>>>>>> + * of exposing those registers to process space. This needs to be > >>>>>>>>> + * done before hw init of ip blocks to take care of HDP flush > >>>>>>>>> + * operations through registers during hw_init. > >>>>>>>>> + */ > >>>>>>>>> + if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers && > >>>>>>>>> + !amdgpu_sriov_vf(adev)) > >>>>>>>>> + adev->nbio.funcs->remap_hdp_registers(adev); > >>>>>>>>> +} > >>>>>>>>> > >>>>>>>>> /** > >>>>>>>>> * amdgpu_device_ip_init - run init for hardware IPs > >>>>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) > >>>>>>>>> DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r); > >>>>>>>>> goto init_failed; > >>>>>>>>> } > >>>>>>>>> + > >>>>>>>>> + amdgpu_device_prepare_ip(adev); > >>>>>>>>> r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); > >>>>>>>>> if (r) { > >>>>>>>>> DRM_ERROR("hw_init %d failed %d\n", i, r); > >>>>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev) > >>>>>>>>> AMD_IP_BLOCK_TYPE_IH, > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> + amdgpu_device_prepare_ip(adev); > >>>>>>>>> for (i = 0; i < adev->num_ip_blocks; i++) { > >>>>>>>>> int j; > >>>>>>>>> struct amdgpu_ip_block *block; > >>>>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev) > >>>>>>>>> { > >>>>>>>>> int i, r; > >>>>>>>>> > >>>>>>>>> + amdgpu_device_prepare_ip(adev); > >>>>>>>>> for (i = 0; i < adev->num_ip_blocks; i++) { > >>>>>>>>> if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw) > >>>>>>>>> continue; > >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c > >>>>>>>>> index b3fba8dea63c..3ac7fef74277 100644 > >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c > >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c > >>>>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle) > >>>>>>>>> nv_program_aspm(adev); > >>>>>>>>> /* setup nbio registers */ > >>>>>>>>> adev->nbio.funcs->init_registers(adev); > >>>>>>>>> - /* remap HDP registers to a hole in mmio space, > >>>>>>>>> - * for the purpose of expose those registers > >>>>>>>>> - * to process space > >>>>>>>>> - */ > >>>>>>>>> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev)) > >>>>>>>>> - adev->nbio.funcs->remap_hdp_registers(adev); > >>>>>>>>> /* enable the doorbell aperture */ > >>>>>>>>> nv_enable_doorbell_aperture(adev, true); > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>>>>> index fde6154f2009..a0481e37d7cf 100644 > >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle) > >>>>>>>>> soc15_program_aspm(adev); > >>>>>>>>> /* setup nbio registers */ > >>>>>>>>> adev->nbio.funcs->init_registers(adev); > >>>>>>>>> - /* remap HDP registers to a hole in mmio space, > >>>>>>>>> - * for the purpose of expose those registers > >>>>>>>>> - * to process space > >>>>>>>>> - */ > >>>>>>>>> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev)) > >>>>>>>>> - adev->nbio.funcs->remap_hdp_registers(adev); > >>>>>>>>> > >>>>>>>>> /* enable the doorbell aperture */ > >>>>>>>>> soc15_enable_doorbell_aperture(adev, true); > >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>>>>> index 55284b24f113..16b447055102 100644 > >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle) > >>>>>>>>> soc21_program_aspm(adev); > >>>>>>>>> /* setup nbio registers */ > >>>>>>>>> adev->nbio.funcs->init_registers(adev); > >>>>>>>>> - /* remap HDP registers to a hole in mmio space, > >>>>>>>>> - * for the purpose of expose those registers > >>>>>>>>> - * to process space > >>>>>>>>> - */ > >>>>>>>>> - if (adev->nbio.funcs->remap_hdp_registers) > >>>>>>>>> - adev->nbio.funcs->remap_hdp_registers(adev); > >>>>>>>>> /* enable the doorbell aperture */ > >>>>>>>>> soc21_enable_doorbell_aperture(adev, true); > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> 2.25.1 > >>>>>>>>>