On 3/6/23 19:39, Ping-Ke Shih wrote:
-----Original Message-----
From: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>
Sent: Monday, March 6, 2023 9:00 PM
To: Larry Finger <Larry.Finger@xxxxxxxxxxxx>
Cc: Ping-Ke Shih <pkshih@xxxxxxxxxxx>; linux-wireless <linux-wireless@xxxxxxxxxxxxxxx>
Subject: Re: Performance of rtw88_8822bu
On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
Hi Larry,
On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
Sascha an Ping-Ke,
I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
code matches the code found in wireless-next. I created 9 files of 5.8 GiB
each and used a for loop to copy them from the test computer to/from my
server. The wireless connection is on the 5 GHz band (channel 153) connected
to an ax1500 Wifi 6 router, which in turn is connected to the server via a
1G ethernet cable. The connection has not crashed, but I see strange
behavior.
What chipset are you using? Is it a RTL8822bu or some other chipset
reported by the driver?
With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
for TX for about 1/3 of the time, but then the driver reports "timed out to
flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
These low rates are in effect for 2/3 of the time. The 5G bands are
relatively unused in my house, thus I do not suspect interference.
I've received a very similar report this weekend. About 3-4 messages per
second, "timed out to flush queue 3", but driver continues to work.
I've also seen it this morning by accident and once again while writing
this mail. This was on a RTL8821CU.
So far I have no idea what the problem might be.
The "timed out to flush queue %d\n" message comes from
__rtw_mac_flush_prio_queue(). Here some registers are read which show
the number of reserved pages for a queue and the number of available
pages of a queue. I used the debugfs interface to observe these
registers from time to time:
f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo "$i
4" > $f; cat $f; done
This is what they show:
reg 0x230: 0x00230040
reg 0x234: 0x00400040
reg 0x238: 0x00400040
reg 0x23c: 0x00000000
The upper 16bit contain the number of available pages and the lower
16bit contain the number of reserved pages (Note these are the registers
on a RTL8822CU, on other chipsets the number of available pages is
lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
for us, it has the values for queue 3.
What I can see is that for the other queues the number of reserved pages
usually matches the number of available pages. It happens sometimes that
the number of available pages goes down to 0x3f, but with the next
register read it goes back to 0x40. For 0x230 this is different though.
Here the number of available pages continuously decreases over time and
never goes back up.
I don't know what this is trying to tell me. It seems that things queued
to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
Unfortunately I also don't know how the different priority queues relate
to the different USB endpoints and how these in turn go together with
the qsel settings. Maybe Ping-Ke can shed some light on this.
To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
to different priority queue would be helpful to identify the problem.
If only this queue works not well, we may dig MAC settings. Otherwise,
it may be a RF performance problem.
0x240 is another queue called public queue. If 0x230/0x234/0x238/0x23c
become full, packets are queued into this queue. From view of MAC circuit,
it fetches these queues in specific order (from high to low conceptually;
I'm 100% sure.), and apply EDCA contention parameters for internal and
external contention.
I don't have much useful ideas to this problem for now.
Ping-Ke and Sasha,
I made a discovery this morning. I set up a transfer from my NFS server to the
computer over an rtw8822bu link using rsync with the --progress option. In a
second window, I ran Sasha's register dump in a loop using a 5 second delay
between readouts. A third window showed was running 'dmesg -w'.
The transfer ran to completion on a 5.8 GiB file with all incremental speeds
reported as 11-12 MB/s. No timeouts on flushing the queue were logged, until I
opened the NetworkManager applet! At that point, I got many queue timeouts
logged, and the instantaneous throughput dropped to 2-3 MB/s as I reported
earlier. Surprisingly, there were no changes in the registers when the errors
happened.
The NM applet is going to be reading the transfer rate from the device, which
apparently messes up the data flow to/from the device.
As long as I do not cause the NM applet to display the connections, I get
nothing logged.
Larry