I added a check of those two registers and rebooted 20 units 10 times. Failure rate was consistent and the traces were all like this: [ 30.227933] rtx_88 failed in download_firmware_validate Support information: [ 30.228392] rtx_88 0/50 0x1C4: 0, 0x10FC: 0 [ 30.244149] rtx_88 1/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.251142] rtx_88 2/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.258269] rtx_88 3/50 0x1C4: fe000000, 0x10FC: 800350f5 [ 30.244149] rtx_88 1/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.251142] rtx_88 2/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.258269] rtx_88 3/50 0x1C4: fe000000, 0x10FC: 800350f5 [ 30.265399] rtx_88 4/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.272388] rtx_88 5/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.279387] rtx_88 6/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.286387] rtx_88 7/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.293392] rtx_88 8/50 0x1C4: fe000000, 0x10FC: 800350f5 [ 30.300386] rtx_88 9/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.307387] rtx_88 10/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.314518] rtx_88 11/50 0x1C4: fe000000, 0x10FC: 800350a5 [ 30.321654] rtx_88 12/50 0x1C4: fe000000, 0x10FC: 800350a6 [ 30.329913] rtx_88 13/50 0x1C4: fe000000, 0x10FC: 800350f6 [ 30.338722] rtx_88 14/50 0x1C4: fe000000, 0x10FC: 800350a6 The pattern and addresses continue and are the same on any device that fails. Going on your statement that 0x10FC is a PC like register, it looks like it’s caught in an infinite loop. Sean > On Jul 17, 2023, at 3:52 AM, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote: > > > >> -----Original Message----- >> From: Sean Mollet <sean@xxxxxxxxxxxx> >> Sent: Monday, July 17, 2023 10:24 AM >> To: Ping-Ke Shih <pkshih@xxxxxxxxxxx> >> Cc: Larry Finger <Larry.Finger@xxxxxxxxxxxx>; linux-wireless@xxxxxxxxxxxxxxx >> Subject: [RFC] RTW88 firmware download issues - improvement, but not perfect >> >> On Jul 16, 2023, at 9:05 PM, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote: >>> >>> >>> >>>>> @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev, >>>>> >>>>> wlan_cpu_enable(rtwdev, true); >>>>> >>>>> - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) { >>>>> - ret = -EBUSY; >>>>> - goto dlfw_fail; >>>>> - } >>>>> - >>>>> ret = download_firmware_validate(rtwdev); >>>>> if (ret) >>>>> goto dlfw_fail; >>>>> >>>>> + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) { >>>>> + ret = -EBUSY; >>>>> + goto dlfw_fail; >>>>> + } >>>>> + >>> >>> This looks reason to restore 0x38 after validating firmware. Do you have a result >>> how this change can improve? >>> >> >> Using a Pi 4 CM as host, this reduces failures from 1 in 5 to 1 in 20. >> >> I don’t know why, but it makes a measurable difference. > > I will check this with my colleague to see if we can apply this change. > >> >>>>> /* reset desc and index */ >>>>> rtw_hci_setup(rtwdev); >>>>> >>>>> diff --git a/util.c b/util.c >>>>> index ff3c269..fbd6599 100644 >>>>> --- a/util.c >>>>> +++ b/util.c >>>>> @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target) >>>>> { >>>>> u32 cnt; >>>>> >>>>> - for (cnt = 0; cnt < 1000; cnt++) { >>>>> + for (cnt = 0; cnt < 5000; cnt++) { >>>>> if (rtw_read32_mask(rtwdev, addr, mask) == target) >>>>> return true; >>>>> >>>>> - udelay(10); >>>>> + udelay(50); >>> >>> I look into the latest vendor driver, it shows that cnt becomes 10,000 and delay >>> is 50us as your change. >> Interesting. Is it possible that the real problem is simply not waiting long enough? >> >> Can you share some details of what the chip is doing and how long it should take? >> > > It seems like I misread the code, the latest version is 5,000 as you mentioned. > > If failed to polling ready, please read and print out 0x1C4 and 0x10fc 20 times > with 1ms or more delay. These store firmware PC-like address, so we can check > if firmware is running or getting stuck. > > Ping-Ke >