Re: QAIC reset failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/16/2024 9:58 AM, Baruch Siach wrote:
Hi qaic driver maintainers,

Sorry I was holiday last week and I am just now catching up on email and seeing this.

I am testing an A100 device on arm64 platform. Kernel version is current
Linus master as of commit 052d534373b7. The driver is unable to reset
the device properly.

[  137.706765] pci 0000:01:00.0: enabling device (0000 -> 0002)
[  137.712528] pci 0000:02:00.0: enabling device (0000 -> 0002)
[  137.718230] qaic 0000:03:00.0: enabling device (0000 -> 0002)
[  137.725720] [drm] Initialized qaic 0.0.0 20190618 for 0000:03:00.0 on minor 0
[  137.734326] mhi mhi0: Requested to power ON
[  137.738520] mhi mhi0: Power on setup success
[  137.855108] mhi mhi0: Wait for device to enter SBL or Mission mode

This all looks good

[  137.861578] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to receive START channel command completion
[  137.870733] qaic_timesync mhi0_QAIC_TIMESYNC: 21: Failed to reset channel, still resetting
[  137.879063] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to reset channel, still resetting
[  137.887334] qaic_timesync: probe of mhi0_QAIC_TIMESYNC failed with error -5
[  137.894866] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to receive START channel command completion
[  137.904006] qaic_timesync mhi0_QAIC_TIMESYNC: 21: Failed to reset channel, still resetting
[  137.912263] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to reset channel, still resetting
[  137.920517] qaic_timesync: probe of mhi0_QAIC_TIMESYNC failed with error -5
[  140.807091] mhi mhi0: Device failed to enter MHI Ready
[  143.695094] mhi mhi0: Device failed to enter MHI Ready

This looks like the device stopped responding to the host, early in boot. Trying to access channels while the device is not in MHI Ready state is odd.

This is with firmware from SDK version 1.12.2.0. I tried also version
1.10.0.193 with similar results.

Some more state information from MHI debugfs below.

/sys/kernel/debug/mhi/mhi0/regdump:
Host PM state: SYS ERROR Process Device state: RESET EE: DISABLE
Device EE: PRIMARY BOOTLOADER state: SYS ERROR
MHI_REGLEN: 0x100
MHI_VER: 0x1000000
MHI_CFG: 0x8000000
MHI_CTRL: 0x0
MHI_STATUS: 0xff04
MHI_WAKE_DB: 0x1
BHI_EXECENV: 0x0
BHI_STATUS: 0xa93f0935
BHI_ERRCODE: 0x0
BHI_ERRDBG1: 0xc0300000
BHI_ERRDBG2: 0xb
BHI_ERRDBG3: 0xcabb0

This suggests that the device crashed, which is unexpected.

/sys/kernel/debug/mhi/mhi0/states:
PM state: SYS ERROR Process Device: Inactive MHI state: RESET EE: DISABLE wake: true
M0: 2 M2: 0 M3: 0 device wake: 0 pending packets: 0

Any idea?

We may need our firmware engineers involved. I think there is already a thread with some of the POCs involved.

-Jeff




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux