On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > > > > On 8/29/2022 10:20 PM, Alex Deucher wrote: > > On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@xxxxxxx> wrote: > >> > >> HDP flush is used early in the init sequence as part of memory controller > >> block initialization. Hence remapping of HDP registers needed for flush > >> needs to happen earlier. > >> > >> This also fixes the Unsupported Request error reported through AER during > >> driver load. The error happens as a write happens to the remap offset > >> before real remapping is done. > >> > >> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&data=05%7C01%7Clijo.lazar%40amd.com%7C0882d00080124386814a08da89de9bcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637973886457404198%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dC%2BCY22cfix1VCcQINrvNWI5XW%2BYV5lleJX3Ju9A6Iw%3D&reserved=0 > >> > >> The error was unnoticed before and got visible because of the commit > >> referenced below. This doesn't fix anything in the commit below, rather > >> fixes the issue in amdgpu exposed by the commit. The reference is only > >> to associate this commit with below one so that both go together. > >> > >> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") > >> > >> Reported-by: Tom Seewald <tseewald@xxxxxxxxx> > >> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > >> Cc: stable@xxxxxxxxxxxxxxx > > > > How about something like the attached patch rather than these two > > patches? It's a bit bigger but seems cleaner and more defensive in my > > opinion. > > > > Whenever device goes to suspend/reset and then comes back, remap offset > has to be set back to 0 to make sure it doesn't use the wrong offset > when the register assumes default values again. > > To avoid the if-check in hdp_flush (which is more frequent), another way > is to initialize the remap offset to default offset during early init > and hw fini/suspend sequences. It won't be obvious (even with this > patch) as to when remap offset vs default offset is used though. On resume, the common IP is resumed first so it will always be set. The only case that is a problem is init because we init GMC out of order. We could init common before GMC in amdgpu_device_ip_init(). I think that should be fine, but I wasn't sure if there might be some fallout from that on certain cards. Alex > > Thanks, > Lijo > > > Alex > > > >> --- > >> v2: > >> Take care of IP resume cases (Alex Deucher) > >> Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling) > >> Add more details in commit message and associate with AER patch (Bjorn > >> Helgaas) > >> > >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++ > >> drivers/gpu/drm/amd/amdgpu/nv.c | 6 ------ > >> drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ------ > >> drivers/gpu/drm/amd/amdgpu/soc21.c | 6 ------ > >> 4 files changed, 24 insertions(+), 18 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index ce7d117efdb5..e420118769a5 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) > >> return 0; > >> } > >> > >> +/** > >> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization > >> + * > >> + * @adev: amdgpu_device pointer > >> + * > >> + * Any common hardware initialization sequence that needs to be done before > >> + * hw init of individual IPs is performed here. This is different from the > >> + * 'common block' which initializes a set of IPs. > >> + */ > >> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev) > >> +{ > >> + /* Remap HDP registers to a hole in mmio space, for the purpose > >> + * of exposing those registers to process space. This needs to be > >> + * done before hw init of ip blocks to take care of HDP flush > >> + * operations through registers during hw_init. > >> + */ > >> + if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers && > >> + !amdgpu_sriov_vf(adev)) > >> + adev->nbio.funcs->remap_hdp_registers(adev); > >> +} > >> > >> /** > >> * amdgpu_device_ip_init - run init for hardware IPs > >> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) > >> DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r); > >> goto init_failed; > >> } > >> + > >> + amdgpu_device_prepare_ip(adev); > >> r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); > >> if (r) { > >> DRM_ERROR("hw_init %d failed %d\n", i, r); > >> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev) > >> AMD_IP_BLOCK_TYPE_IH, > >> }; > >> > >> + amdgpu_device_prepare_ip(adev); > >> for (i = 0; i < adev->num_ip_blocks; i++) { > >> int j; > >> struct amdgpu_ip_block *block; > >> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev) > >> { > >> int i, r; > >> > >> + amdgpu_device_prepare_ip(adev); > >> for (i = 0; i < adev->num_ip_blocks; i++) { > >> if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw) > >> continue; > >> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c > >> index b3fba8dea63c..3ac7fef74277 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/nv.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c > >> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle) > >> nv_program_aspm(adev); > >> /* setup nbio registers */ > >> adev->nbio.funcs->init_registers(adev); > >> - /* remap HDP registers to a hole in mmio space, > >> - * for the purpose of expose those registers > >> - * to process space > >> - */ > >> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev)) > >> - adev->nbio.funcs->remap_hdp_registers(adev); > >> /* enable the doorbell aperture */ > >> nv_enable_doorbell_aperture(adev, true); > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c > >> index fde6154f2009..a0481e37d7cf 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > >> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle) > >> soc15_program_aspm(adev); > >> /* setup nbio registers */ > >> adev->nbio.funcs->init_registers(adev); > >> - /* remap HDP registers to a hole in mmio space, > >> - * for the purpose of expose those registers > >> - * to process space > >> - */ > >> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev)) > >> - adev->nbio.funcs->remap_hdp_registers(adev); > >> > >> /* enable the doorbell aperture */ > >> soc15_enable_doorbell_aperture(adev, true); > >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c > >> index 55284b24f113..16b447055102 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c > >> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle) > >> soc21_program_aspm(adev); > >> /* setup nbio registers */ > >> adev->nbio.funcs->init_registers(adev); > >> - /* remap HDP registers to a hole in mmio space, > >> - * for the purpose of expose those registers > >> - * to process space > >> - */ > >> - if (adev->nbio.funcs->remap_hdp_registers) > >> - adev->nbio.funcs->remap_hdp_registers(adev); > >> /* enable the doorbell aperture */ > >> soc21_enable_doorbell_aperture(adev, true); > >> > >> -- > >> 2.25.1 > >>