On 2/28/22 08:30, Thorsten Leemhuis wrote:
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.
Yan-Hsuan Chuang, sorry, I failed to notice that you didn't get the
regression report below. Could you take a look what's wrong there?
BTW: Nico, did you try another bisection? And is the problem still
happening? Did you maybe give 5.17-rc a shot to check if the problem
still happens there?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.
#regzbot poke
On 15.02.22 09:25, Thorsten Leemhuis wrote:
[...]
On 14.02.22 20:25, Nico Sneck wrote:
Hi,
I'm running Fedora 35 on a Huawei Matestation S (HUAWEI PUM-WDX9), AMD
Renoir with Realtek rtw_8822ce handling wifi stuff.
Ever since the kernel update from 5.15.13-200.fc35 to 5.16.8-200.fc35
(which I performed Feb 12th), I noticed that my Wifi connection
doesn't really work anymore. I'm connecting to a Zyxel VMG3927-B50A,
and it appears to be using 5 GHz connection always. I also tested that
5.17-rc4 also suffers from this issue.
The issue is that even trying to ping my routers gateway address will
result in connection timeouts, and ping times are in the thousands to
tens of thousands of milliseconds (normally peak ping times are ~3-6
ms), making wireless unusable with 5.16+.
I can also see that in dmesg logs there are two types of rtw_8822ce
driver warnings flooding the logs, which I didn't see with 5.15:
"helmi 13 18:20:03 fedora kernel: rtw_8822ce 0000:06:00.0: timed out
to flush queue {1,2}"
"helmi 13 18:16:23 fedora kernel: rtw_8822ce 0000:06:00.0: failed to
get tx report from firmware"
Some stats:
On kernel 5.15.13-200.fc35 running for 29 days:
[nico@fedora ~]$ journalctl -k -b -18 | grep 'timed out to flush queue' | wc -l
0
[nico@fedora ~]$ journalctl -k -b -18 | grep 'failed to get tx report
from firmware' | wc -l
0
On kernel 5.16.8-200.fc35 running for 4 hours:
[nico@fedora ~]$ journalctl -k -b -17 | grep 'timed out to flush queue' | wc -l
45370
[nico@fedora ~]$ journalctl -k -b -17 | grep 'failed to get tx report
from firmware' | wc -l
502
I tried bisecting which commit introduced this regression, but after
some 12 hours of recompiling and testing, it seems like I failed
somehow. I tried a bisect with first known good revision as
8bb7eca972ad (5.15 release commit), and first known bad revision as
df0cc57e057f (5.16 release commit). I managed to identify that
revision
fc02cb2b37fe Merge tag 'net-next-for-5.16' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
is bad, but then all other revisions were good apart from
8a33dcc2f6d5 (refs/bisect/bad) Merge
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
which was also bad.
But here's the baffling part, commit 6b278c0cb378 was good, and it's
the last commit in the merge (8a33dcc2f6d5) which appeared bad.
Now I retested with 8a33dcc2f6d5, and I don't see the issues anymore,
so I guess I tested a wrong kernel version at that point or something.
shrug.
So I can only assume that the regression came in one of the commits inside
fc02cb2b37fe Merge tag 'net-next-for-5.16' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
but it'll take me a while to be try bisecting the commits in that merge again.
If anyone has any idea about what could cause these issues I'm seeing,
I can try out patches / test different things. But I'll try
rebisecting this again soon.
Nico,
Your use of rtw_8822ce in the title finally registered on me. With that driver
in use, that means that you are using my GitHub repo; however, newer kernels
have the driver built in, but with names such as rtw88_8822ce. The difference in
the name is deliberate. If you want to use the GitHub version, you must
blacklist the ones from the kernel.
To check this, run 'lsmod | grep 88'. If you see a mixture of rtw_xxx and
rtw88_xxx, then this is your problem.
Larry