On 1/16/2024 9:58 AM, Baruch Siach wrote:
Hi qaic driver maintainers,
Sorry I was holiday last week and I am just now catching up on email and seeing this.
I am testing an A100 device on arm64 platform. Kernel version is current Linus master as of commit 052d534373b7. The driver is unable to reset the device properly. [ 137.706765] pci 0000:01:00.0: enabling device (0000 -> 0002) [ 137.712528] pci 0000:02:00.0: enabling device (0000 -> 0002) [ 137.718230] qaic 0000:03:00.0: enabling device (0000 -> 0002) [ 137.725720] [drm] Initialized qaic 0.0.0 20190618 for 0000:03:00.0 on minor 0 [ 137.734326] mhi mhi0: Requested to power ON [ 137.738520] mhi mhi0: Power on setup success [ 137.855108] mhi mhi0: Wait for device to enter SBL or Mission mode
This all looks good
[ 137.861578] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to receive START channel command completion [ 137.870733] qaic_timesync mhi0_QAIC_TIMESYNC: 21: Failed to reset channel, still resetting [ 137.879063] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to reset channel, still resetting [ 137.887334] qaic_timesync: probe of mhi0_QAIC_TIMESYNC failed with error -5 [ 137.894866] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to receive START channel command completion [ 137.904006] qaic_timesync mhi0_QAIC_TIMESYNC: 21: Failed to reset channel, still resetting [ 137.912263] qaic_timesync mhi0_QAIC_TIMESYNC: 20: Failed to reset channel, still resetting [ 137.920517] qaic_timesync: probe of mhi0_QAIC_TIMESYNC failed with error -5 [ 140.807091] mhi mhi0: Device failed to enter MHI Ready [ 143.695094] mhi mhi0: Device failed to enter MHI Ready
This looks like the device stopped responding to the host, early in boot. Trying to access channels while the device is not in MHI Ready state is odd.
This is with firmware from SDK version 1.12.2.0. I tried also version 1.10.0.193 with similar results. Some more state information from MHI debugfs below. /sys/kernel/debug/mhi/mhi0/regdump: Host PM state: SYS ERROR Process Device state: RESET EE: DISABLE Device EE: PRIMARY BOOTLOADER state: SYS ERROR MHI_REGLEN: 0x100 MHI_VER: 0x1000000 MHI_CFG: 0x8000000 MHI_CTRL: 0x0 MHI_STATUS: 0xff04 MHI_WAKE_DB: 0x1 BHI_EXECENV: 0x0 BHI_STATUS: 0xa93f0935 BHI_ERRCODE: 0x0 BHI_ERRDBG1: 0xc0300000 BHI_ERRDBG2: 0xb BHI_ERRDBG3: 0xcabb0
This suggests that the device crashed, which is unexpected.
/sys/kernel/debug/mhi/mhi0/states: PM state: SYS ERROR Process Device: Inactive MHI state: RESET EE: DISABLE wake: true M0: 2 M2: 0 M3: 0 device wake: 0 pending packets: 0 Any idea?
We may need our firmware engineers involved. I think there is already a thread with some of the POCs involved.
-Jeff