Re: [Bug 216373] New: Uncorrected errors reported for AMD GPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/25/2022 1:04 PM, Christian König wrote:
Am 25.08.22 um 08:40 schrieb Stefan Roese:
On 24.08.22 16:45, Tom Seewald wrote:
On Wed, Aug 24, 2022 at 12:11 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote:
Unfortunately, I don't have any NV platforms to test. Attached is an
'untested-patch' based on your trace logs.

Thanks,
Lijo

Thank you for the patch. It applied cleanly to v6.0-rc2 and after
booting that kernel I no longer see any messages about PCI errors. I
have uploaded a dmesg log to the bug report:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fattachment.cgi%3Fid%3D301642&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7Cd55a659245b24864bd2d08da8664ae2d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637970065087671063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&amp;sdata=vbhJ9OB0jIYr%2FRkDIbQHhRRqhyklnnHOT9Xi8z17MYY%3D&amp;reserved=0

I did not follow this thread in depth, but FWICT the bug is solved now
with this patch. So is it correct, that the now fully enabled AER
support in the PCI subsystem in v6.0 helped detecting a bug in the AMD
GPU driver?

It looks like it, but I'm not 100% sure about the rational behind it.

Lijo can you explain more on this?


From the trace, during gmc hw_init it takes this route -

gart_enable -> amdgpu_gtt_mgr_recover -> amdgpu_gart_invalidate_tlb -> amdgpu_device_flush_hdp -> amdgpu_asic_flush_hdp (non-ring based HDP flush)

HDP flush is done using remapped offset which is MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)

WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);

However, the remapping is not yet done at this point. It's done at a later point during common block initialization. Access to the unmapped offset '(0x80000 - PAGE_SIZE)' seems to come back as unsupported request and reported through AER.

In the patch, I just moved the remapping before gmc block initialization.

Thanks,
Lijo

Thanks,
Christian.


Thanks,
Stefan




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux