Wen Gong <wgong@xxxxxxxxxxxxxx> writes: > When simulate random transfer fail for sdio write and read, it happened > "Could not init core: -110" and then wlan up fail. > > Test steps: > 1. Add config and update kernel: > CONFIG_FAIL_MMC_REQUEST=y > CONFIG_FAULT_INJECTION=y > CONFIG_FAULT_INJECTION_DEBUG_FS=y > > 2. Run simulate fail: > cd /sys/kernel/debug/mmc1/fail_mmc_request > echo 100 > probability > echo 1000 > interval > echo 1 > times && enter system suspend > press power button to wakeup system > > 3. It happened Could not init core: -110. > [ 66.432068] ath10k_sdio mmc1:0001:1: unable to send the bmi data to the device: -110 > [ 66.440012] ath10k_sdio mmc1:0001:1: unable to write to the device > [ 66.453375] ath10k_sdio mmc1:0001:1: Could not init core: -110 [...] > Add retry mechanism for ath10k_start to make sure wlan up success. I'm not convinved about this. ath10k assumes that SDIO bus works reliably and there's no data loss. In my opinion if the SDIO is not working reliably we should fail immediately with a clear error message for the user, instead of having an unstable connection. And I understand from the logs that ath10k fails cleanly in this simulated failure. So what you do here is ignore the assumption that the SDIO bus should always work reliably and add a workaround by trying to restart the firmware multiple times, and hope that by luck it works during one of 10 retry attempts. But then what? Isn't the WLAN connection flaky as SDIO bus is not reliable? So if we were to follow that design logic, shouldn't we add retries for _all_ ath10k SDIO transactions? But that would make ath10k even more complex as it is. Because I think this patch makes things worse for the user, so I would like to understand the real life use case this patch is trying to fix and how it would help the user. -- https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches