On 08/24/2016 06:29 PM, Ed Swierk wrote:
I'm trying to migrate from the Octeon SDK to a vanilla Linux 4.4
kernel for a Cavium OCTEON II (CN6880) board running in 64-bit
little-endian mode. So far I've gotten most of the hardware features I
need working, including XAUI/RXAUI, USB, boot bus and I2C, with a
fairly small set of patches.
https://github.com/skyportsystems/linux/compare/master...octeon2
It is unclear what your motivations for doing this are, so I can think
of several things you could do:
A) Get v4.4 based SDK from Cavium.
B) Major rewrite of octeon-ethernet driver.
C) Live with current staging driver.
The biggest remaining hurdle is improving 10G Ethernet performance:
iperf -P 10 on the SDK kernel gets close to 10 Gbit/sec throughput,
while on my 4.4 kernel, it tops out around 1 Gbit/sec.
Comparing the octeon-ethernet driver in the SDK
(http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/tree/drivers/net/ethernet/octeon?h=apaliwal/octeon)
against the one in 4.4, the latter appears to utilize only a single
CPU core for the rx path. It's not clear to me if there is a similar
issue on the tx side, or other bottlenecks.
The main limiting factor to performance is single threaded RX
processing. The main manner this is handled in the out-of-tree vendor
driver is to have multiple NAPI processing threads running against the
same RX queue when there is a queue backlog. The disadvantage of doing
this is that packets may be received out of order due to
non-synchronization across multiple CPUs.
On the TX side, the locks on the queuing discipline can become contended
leading to cache line bouncing. In the TX code of the driver itself,
there should be no impediments to parallel TX operations.
Ideally we would configure the packet classifiers on the RX side to
create multiple RX queues based on a hash of the TCP 5-tuple, and handle
each queue with a single NAPI instance. That should result in better
performance while maintaining packet ordering.
I started trying to port multi-CPU rx from the SDK octeon-ethernet
driver, but had trouble teasing out just the necessary bits without
following a maze of dependencies on unrelated functions. (Dragging
major parts of the SDK wholesale into 4.4 defeats the purpose of
switching to a vanilla kernel, and doesn't bring us closer to getting
octeon-ethernet out of staging.)
Yes, you have identified the main problem with this code.
All the code managing the SerDes and other MAC functions needs a
complete rewrite. One main problem is that all the SerDes/MACs in the
system are configured simultaneously instead of on a per device basis.
There are also a plethora of different SerDes technologies in use:
(RGMII, SGMII, QSGMII, XFI, XAUI, RXAUI, SPI-4.1, XLAUI, KR, ...) The
code that handles all of these is mixed together with huge case
statements switching on interface mode all over the place.
There is also code to handle target-mode PCI/PCIe packet engines mixed
in as well. This stuff should probably be removed.
Has there been any work on the octeon-ethernet driver since this patch
set? https://www.linux-mips.org/archives/linux-mips/2015-08/msg00338.html
Any hints on what to pick out of the SDK code to improve 10G
performance would be appreciated.
--Ed