Re: [Performance regression] BCM4359/9 on S905X2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

On 21/03/2023 11:40, Marc Gonzalez wrote:
Hello everyone,

I've been benchmarking an Amlogic S905X2 board.
It provides a BCM4359/9 WiFi chip connected through SDIO.

There's a large performance gap between vendor kernel and mainline.
(Downloading a 1GB file to /dev/null from a device inches away)

# curl -o /dev/null http://192.168.1.254:8095/fixed/1G
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
100 1024M  100 1024M    0     0  27.5M      0  0:00:37  0:00:37 --:--:-- 28.6M
vs
100 1024M  100 1024M    0     0  11.0M      0  0:01:32  0:01:32 --:--:-- 11.0M

Line 1 = vendor kernel (4.9.180 amlogic android)
Line 2 = mainline kernel (6.2.0-rc8)

Why is the vendor kernel 2.5 times faster?

(I'm using the same firmware files, but it seems the vendor kernel reads
an additional configuration file that the mainline vendor seems to ignore.)

I think you shall look at https://lore.kernel.org/all/20190527124307.32075-1-narmstrong@xxxxxxxxxxxx/

In summary, for G12A & G12B SoCs (S905X/Y/D2, A311D, S922X) the SDIO controller cannot access
the DDR directly due to an HW design bug, but it can only use the 1.5KiB scratch buffer memory
at the end of the controller registers.

Amlogic did a mask fix on those SoCs to allow routing the SDCard controller to the SDIO pads,
and in their kernel fork they use the SDCard controller for both SDCard and SDIO busses by
switching the pads.

With this trick they managed to have allmost the same bandwidth but with some limitations
when some SDCard transaction occurs and probably some conformance issues since they affect
the SDCard & SDIO pads state when they are switched.

It was decided to only upstream the scratch buffer memory fix because it wasn't invasive
and used the already in-place scratch buffer mechanism. This works but has limitations
due to the very short buffer, and doesn't permit scatter gather.

Porting the Amlogic current solution is out-of-scope because shared controller for multiple
SD slots isn't implemented and not even planned do to some concerns on the I/O states.

Neil


Regards




[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux