On 8/19/21 10:52 AM, Hui Wang wrote:
On 8/19/21 3:49 PM, Marek Vasut wrote:
On 8/19/21 7:31 AM, Greg Kroah-Hartman wrote:
On Thu, Aug 19, 2021 at 10:57:03AM +0800, Hui Wang wrote:
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and
found this
patch introduced the rsi driver crashing when running system
resume on the
Dell 300x IoT platform (100% rate). Below is the log, After
seeing this log,
the rsi wifi can't work anymore, need to run 'rmmod
rsi_sdio;modprobe
rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this
patch is not
suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer
instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that
information
is of any use.
But I have to admit, I am tempted to mark the whole driver as
BROKEN and
submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware
lacks
any documentation. I spent an insane amount of time talking to RedPine
Signals / SiLabs trying to get help with basic things like association
problems against various APs, no result there. I tried getting
hardware
docs from them so I can fix the driver myself, no result either. So
far
I tried to pick various fixes from their downstream driver and submit
them, but that is massively time consuming and the changes there
are not
separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the
driver or
helping others with fixing it, so maybe we should just mark it as
broken
... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash
after
resume under 4.19.191:
admin@HW6VB02:~$ uname -a
Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64
x86_64
x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for
non-removable
host: -38
[ 59.682917] Freezing user space processes ... (elapsed 0.003
seconds)
done.
[ 59.686063] OOM killer disabled.
[ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 59.687385] Suspending console(s) (use no_console_suspend to debug)
[ 59.687931] rsi_91x: ===> Interface DOWN <===
[ 70.068983] mmc1: Controller never released inhibit bit(s).
[ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP
===========
[ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff
[ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff
[ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode:
0x0000ffff
[ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl:
0x000000ff
[ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff
[ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff
[ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat:
0xffffffff
So let us revert this commit from 4.19.y?
If you revert it, does it work properly? What about in Linus's tree?
I reverted the commit in the 4.19.191, then the wifi could work both
before and after the system resume. I tested the mainline kernel
linux-5.13, before suspend, the wifi could work, after suspend, the
whole system can't wakeup, and I couldn't recover the system since I
can't access the machine physically. I did all test via ssh remotely. So
there is no testing result for Linus' tree.
I suspect you just hit the issue this patch was trying to fix then.
If you have console access, use no_console_suspend to see the backtrace
on wake up.
I suspect in that case, sdio_claim_host() will spin indefinitely and
never finish, see the c434e5e48dc4e ("rsi: Use resume_noirq for SDIO")
commit message.
At least, we never seen this issue in the kernel 4.15, without the
commit of c434e5e48dc4e ("rsi: Use resume_noirq for SDIO"), the wifi and
bluetooth works well before and after suspend.
I suspect you might've just been lucky with that, because it seems RSI
did hit it too (see below). This could also be something which triggers
only on specific controller drivers (?).
Note that I did my tests on ARM MMCI (stm32mp1 variant).
The platform I am testing is a X86 one, and the sdhci controller driver
is sdhci_acpi.c.
Do you have an RSI module which can be plugged into an SD card slot
there , or is that RSI module soldered-on on some devkit/board ?
Mine is the later, soldered on a SoM, so I have hard time testing on
other SDIO controllers.
This "[ 70.068983] mmc1: Controller never released inhibit bit(s)"
looks suspicious in the log above.
Also, newer versions of the RSI downstream driver [1] as of 390542d
("Updated Readme.txt file") simply comment out
rsi_sdio_enable_interrupts() in rsi/rsi_91x_sdio.c rsi_resume(), which
looks like RSI ran into the same problem, but "fixed" it differently.
I think that approach RSI took is wrong and it just hid the issue.
[1] git://github.com/SiliconLabs/RS911X-nLink-OSD
The bottom line is, I would really prefer to figure out what the problem
that you see on the Linux 5.13.y is and fix that and backport that fix,
so the suspend/resume works correctly for everyone ; rather than revert
a patch without really understanding the underlying problem.
Sadly, the RSI driver is buggy.