Re: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_0.2.1.0 test failed (-110)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex,

Thanks for your prompt response.

On Mon, Feb 24 2025, Alex Deucher wrote:
> On Mon, Feb 24, 2025 at 8:51 AM Baruch Siach <baruch@xxxxxxxxxx> wrote:
>> I see this failure on probe when trying to bring up amdgpu on a new arm64
>> platform.  Kernel is v6.14-rc4, and aldebaran firmware is latest
>> (linux-firmware commit 4f47e84d06f9).
>>
>> Tested with these kernel command line parameters:
>>
>> amdgpu.vm_size=1 amdgpu.msi=1 amdgpu.gartsize=32 amdgpu.vramlimit=32 amdgpu.gttsize=32
>
> Why are you setting those?  Does the driver load ok if you do not
> specify those driver options?

Driver probe fails the same way with or without these options. These
options were meant to fit the driver into limited address space.

>> I guess the "CP firmware version" warning is bogus. IP version for GC_HWIP is
>> 9.4.2.
>>
>> Any idea?
>
> Potentially the driver parameters combination is causing a problem, or
> your ARM SoC may not be PCIe compliant.  A lot of small SoC's just
> throw a PCIe bridge on the SoC without proper coherency in place
> between the CPU and the PCIe bus.  PCIe requires cohorency with the
> CPU (i.e., the device can snoop the CPU's cache).

I'll take a look into PCIe coherency in this SoC.

Thanks for the hint.

baruch

>> Relevant log snippets follows:
>>
>> [    1.792949] pci 0000:05:00.0: [1002:740f] type 00 class 0x038000 PCIe Endpoint
>> [    1.800652] pci 0000:05:00.0: BAR 0 [mem 0x00000000-0xfffffffff 64bit pref]
>> [    1.807629] pci 0000:05:00.0: BAR 2 [mem 0x00000000-0x001fffff 64bit pref]
>> [    1.814506] pci 0000:05:00.0: BAR 4 [io  0x0000-0x00ff]
>> [    1.819729] pci 0000:05:00.0: BAR 5 [mem 0x00000000-0x0007ffff]
>> [    1.825647] pci 0000:05:00.0: ROM [mem 0x00000000-0x0001ffff pref]
>> [    1.833297] pci 0000:05:00.0: PME# supported from D1 D2 D3hot D3cold
>> [    1.840118] pci 0000:05:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:02:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
>> [    1.857150] pci_bus 0000:05: busn_res: [bus 05-ff] end is updated to 05
>> ...
>> [    2.615336] pci 0000:05:00.0: BAR 0 [mem 0x1000000000-0x1fffffffff 64bit pref]: assigned
>> [    2.623529] pci 0000:05:00.0: BAR 2 [mem 0x2000000000-0x20001fffff 64bit pref]: assigned
>> [    2.631720] pci 0000:05:00.0: BAR 5 [mem 0x5d000000-0x5d07ffff]: assigned
>> [    2.638544] pci 0000:05:00.0: ROM [mem 0x5d080000-0x5d09ffff pref]: assigned
>> [    2.645583] pci 0000:05:00.0: BAR 4 [io  size 0x0100]: can't assign; no space
>> [    2.652707] pci 0000:05:00.0: BAR 4 [io  size 0x0100]: failed to assign
>> ...
>> [    3.153154] amdgpu 0000:05:00.0: enabling device (0000 -> 0002)
>> [    3.159112] [drm] initializing kernel modesetting (ALDEBARAN 0x1002:0x740F 0x1002:0x0C34 0x02).
>> [    3.167817] [drm] register mmio base: 0x5D000000
>> [    3.172425] [drm] register mmio size: 524288
>> [    3.176775] amdgpu 0000:05:00.0: amdgpu: detected ip block number 0 <soc15_common>
>> [    3.184341] amdgpu 0000:05:00.0: amdgpu: detected ip block number 1 <gmc_v9_0>
>> [    3.191558] amdgpu 0000:05:00.0: amdgpu: detected ip block number 2 <vega20_ih>
>> [    3.198858] amdgpu 0000:05:00.0: amdgpu: detected ip block number 3 <psp>
>> [    3.205639] amdgpu 0000:05:00.0: amdgpu: detected ip block number 4 <smu>
>> [    3.212421] amdgpu 0000:05:00.0: amdgpu: detected ip block number 5 <gfx_v9_0>
>> [    3.219635] amdgpu 0000:05:00.0: amdgpu: detected ip block number 6 <sdma_v4_0>
>> [    3.226935] amdgpu 0000:05:00.0: amdgpu: detected ip block number 7 <vcn_v2_6>
>> [    3.234149] amdgpu 0000:05:00.0: amdgpu: detected ip block number 8 <jpeg_v2_6>
>> [    3.247351] amdgpu 0000:05:00.0: amdgpu: Fetched VBIOS from ROM BAR
>> [    3.253626] amdgpu: ATOM BIOS: 113-D67301V-073
>> [    3.259731] [drm] CP firmware version too old, please update!
>> [    3.260400] amdgpu 0000:05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>> [    3.274294] amdgpu 0000:05:00.0: amdgpu: PCIE atomic ops is not supported
>> [    3.281115] amdgpu 0000:05:00.0: amdgpu: MEM ECC is active.
>> [    3.286679] amdgpu 0000:05:00.0: amdgpu: SRAM ECC is active.
>> [    3.292351] amdgpu 0000:05:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[7ff7f] ras_mask[7ff7f]
>> [    3.303232] [drm] vm size is 1 GB, 2 levels, block size is 9-bit, fragment size is 9-bit
>> [    3.311338] amdgpu 0000:05:00.0: amdgpu: VRAM: 65520M 0x0000020000000000 - 0x0000020FFEFFFFFF (32M used)
>> [    3.320811] amdgpu 0000:05:00.0: amdgpu: GART: 32M 0x0000000000000000 - 0x0000000001FFFFFF
>> [    3.329070] [drm] Detected VRAM RAM=65520M, BAR=65536M
>> [    3.334199] [drm] RAM width 4096bits HBM
>> [    3.338251] [drm] amdgpu: 32M of VRAM memory ready
>> [    3.343039] [drm] amdgpu: 32M of GTT memory ready.
>> [    3.347861] [drm] GART: num cpu pages 8192, num gpu pages 8192
>> [    3.353779] [drm] PCIE GART of 32M enabled.
>> [    3.357955] [drm] PTB located at 0x0000020001FF0000
>> [    3.365901] [drm] Found VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 28
>> [    3.432199] amdgpu 0000:05:00.0: amdgpu: reserve 0x800000 from 0x20001000000 for PSP TMR
>> [    3.504497] amdgpu 0000:05:00.0: amdgpu: smu driver if version = 0x00000008, smu fw if version = 0x00000009, smu fw program = 0, smu fw version = 0x00443f00 (68.63.0)
>> [    3.519356] amdgpu 0000:05:00.0: amdgpu: SMU driver if version not matched
>> [    3.526265] amdgpu 0000:05:00.0: amdgpu: use vbios provided pptable
>> [    3.532523] amdgpu 0000:05:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.10
>> [    3.560964] amdgpu 0000:05:00.0: amdgpu: SMU is initialized successfully!
>> [    3.568167] [drm] kiq ring mec 2 pipe 1 q 0
>> [    3.785160] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_0.2.1.0 test failed (-110)
>> [    3.794825] [drm:amdgpu_gfx_enable_kcq] *ERROR* KCQ enable failed
>> [    3.800929] [drm:amdgpu_device_init] *ERROR* hw_init of IP block <gfx_v9_0> failed -110
>> [    3.808929] amdgpu 0000:05:00.0: amdgpu: amdgpu_device_ip_init failed
>> [    3.815361] amdgpu 0000:05:00.0: amdgpu: Fatal error during GPU init
>> [    3.821705] amdgpu 0000:05:00.0: amdgpu: amdgpu: finishing device.
>>
>> Thanks,
>> baruch
>>
>> --
>>                                                      ~. .~   Tk Open Systems
>> =}------------------------------------------------ooO--U--Ooo------------{=
>>    - baruch@xxxxxxxxxx - tel: +972.52.368.4656, http://www.tkos.co.il -

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@xxxxxxxxxx - tel: +972.52.368.4656, http://www.tkos.co.il -




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux