On Mon, Dec 06, 2021 at 10:28:46AM +0100, Clément Léger wrote: > Le Sat, 4 Dec 2021 13:43:43 +0000, > Vladimir Oltean <vladimir.oltean@xxxxxxx> a écrit : > > > On Fri, Dec 03, 2021 at 06:19:16PM +0100, Clément Léger wrote: > > > Ethernet frames can be extracted or injected autonomously to or from > > > the device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list > > > data structures in memory are used for injecting or extracting Ethernet > > > frames. The FDMA generates interrupts when frame extraction or > > > injection is done and when the linked lists need updating. > > > > > > The FDMA is shared between all the ethernet ports of the switch and > > > uses a linked list of descriptors (DCB) to inject and extract packets. > > > Before adding descriptors, the FDMA channels must be stopped. It would > > > be inefficient to do that each time a descriptor would be added so the > > > channels are restarted only once they stopped. > > > > > > Both channels uses ring-like structure to feed the DCBs to the FDMA. > > > head and tail are never touched by hardware and are completely handled > > > by the driver. On top of that, page recycling has been added and is > > > mostly taken from gianfar driver. > > > > > > Co-developed-by: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx> > > > Signed-off-by: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx> > > > Signed-off-by: Clément Léger <clement.leger@xxxxxxxxxxx> > > > --- > > > > Doesn't look too bad. Did the page reuse make any difference to the > > throughput, or is the interaction with the FDMA extraction channel where > > the bottleneck is? > > With a standard MTU, the results did not improved a lot... TCP RX add a > small improvement (~4MBit/s) but that is the only one. > Here are the new results with the FDMA: > > TCP TX: 48.2 Mbits/sec > TCP RX: 60.9 Mbits/sec > UDP TX: 28.8 Mbits/sec > UDP RX: 18.8 Mbits/sec > > In jumbo mode (9000 bytes frames), there is improvements: > > TCP TX: 74.4 Mbits/sec > TCP RX: 109 Mbits/sec > UDP TX: 105 Mbits/sec > UDP RX: 51.6 Mbits/sec Yeah, I don't know what else to tell you. Sorry. > > > diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h > > > index 11c99fcfd341..2667a203e10f 100644 > > > --- a/include/soc/mscc/ocelot.h > > > +++ b/include/soc/mscc/ocelot.h > > > @@ -692,6 +692,12 @@ struct ocelot { > > > /* Protects the PTP clock */ > > > spinlock_t ptp_clock_lock; > > > struct ptp_pin_desc ptp_pins[OCELOT_PTP_PINS_NUM]; > > > + > > > + struct ocelot_fdma *fdma; > > > + /* Napi context used by FDMA. Needs to be in ocelot to avoid using a > > > + * backpointer in ocelot_fdma > > > + */ > > > + struct napi_struct napi; > > > > Can it at least be dynamically allocated, and kept as a pointer here? > > If it is dynamically allocated, then container_of can't be used anymore > in the napi poll function. I could move it back in struct fdma but > then, I would need a backpointer to ocelot in the fdma struct. > Or I could use napi->dev and access the ocelot_port_private to then get > the ocelot pointer but I have not seen much driver using the napi->dev > field. Tell me what you would like. If you want to move it back to struct ocelot_fdma, you can do that, I'm fine with that now :) Sorry for the trouble.