Re: [PATCH net-next v5 10/10] net: axienet: Introduce dmaengine support

Jakub Kicinski <kuba@xxxxxxxxxx> · Mon, 14 Aug 2023 08:29:53 -0700

On Sat, 12 Aug 2023 15:27:19 +0000 Pandey, Radhey Shyam wrote:
> > Drop on error, you're not stopping the queue correctly, just drop, return OK
> > and avoid bugs.  
> 
> As I understand NETDEV_TX_OK returns means driver took care of packet.
> So inline with non-dmaengine xmit (axienet_start_xmit_legacy) should
> we stop the queue and return TX_BUSY?

You should only return BUSY if there is no space. All other errors
should lead to drops, and increment of tx_error. Otherwise problem
with handling a single packet may stall the NIC forever.
It is somewhat confusing that we return TX_OK in that case but it
is what it is.

> > Why create a cache ?
> > Isn't it cleaner to create a fake ring buffer of sgl? Most packets will not have
> > MAX_SKB_FRAGS of memory. On a ring buffer you can use only as many sg
> > entries as the packet requires. Also no need to alloc/free.  
> 
> The kmem_cache is used with intent to use slab cache interface and
> make use of reusing objects in the kernel. slab cache maintains a 
> cache of objects. When we free an object, instead of
> deallocating it, it give it back to the cache. Next time, if we
> want to create a new object, slab cache gives us one object from the
> slab cache.
> 
> If we maintain custom circular buffer (struct circ_buf) ring buffer 
> we have to create two such ring buffers one for TX and other for RX.
> For multichannel this will multiply to * no of queues. Also we have to
> ensure proper occupancy checks and head/tail pointer updates.
> 
> With kmem_cache pool we are offloading queue maintenance ops to
> framework with a benefit of optimized alloc/dealloc. Let me know if it 
> looks functionally fine and can retain it for this baseline dmaengine 
> support version?

The kmemcache is not the worst possible option but note that the
objects you're allocating (with zeroing) are 512+ bytes. That's
pretty large, when most packets will not have full 16 fragments.
Ring buffer would allow to better match the allocation size to 
the packets. Not to mention that it can be done fully locklessly.