RE: [PATCH] rtw88: pci: Use general byte arrays as the elements of RX ring

Tony Chuang <yhchuang@xxxxxxxxxxx> · Tue, 30 Jul 2019 03:11:39 +0000

> > > > While allocating all 512 buffers in one block (just over 4MB)
> > > > is probably not a good idea, you may need to allocated (and dma map)
> > > > then in groups.
> > >
> > > Thanks for reviewing.  But got questions here to double confirm the
> idea.
> > > According to original code, it allocates 512 skbs for RX ring and dma
> > > mapping one by one.  So, the new code allocates memory buffer 512
> > > times to get 512 buffer arrays.  Will the 512 buffers arrays be in one
> > > block?  Do you mean aggregate the buffers as a scatterlist and use
> > > dma_map_sg?
> >
> > If you malloc a buffer of size (8192+32) the allocator will either
> > round it up to a whole number of (often 4k) pages or to a power of
> > 2 of pages - so either 12k of 16k.
> > I think the Linux allocator does the latter.
> > Some of the allocators also 'steal' a bit from the front of the buffer
> > for 'red tape'.
> >
> > OTOH malloc the space 15 buffers and the allocator will round the
> > 15*(8192 + 32) up to 32*4k - and you waste under 8k across all the
> > buffers.
> >
> > You then dma_map the large buffer and split into the actual rx buffers.
> > Repeat until you've filled the entire ring.
> > The only complication is remembering the base address (and size) for
> > the dma_unmap and free.
> > Although there is plenty of padding to extend the buffer structure
> > significantly without using more memory.
> > Allocate in 15's and you (probably) have 512 bytes per buffer.
> > Allocate in 31's and you have 256 bytes.
> >
> > The problem is that larger allocates are more likely to fail
> > (especially if the system has been running for some time).
> > So you almost certainly want to be able to fall back to smaller
> > allocates even though they use more memory.
> >
> > I also wonder if you actually need 512 8k rx buffers to cover
> > interrupt latency?
> > I've not done any measurements for 20 years!
> 
> Thanks for the explanation.
> I am not sure the combination of 512 8k RX buffers.  Maybe Realtek
> folks can give us some idea.
> Tony Chuang any comment?
> 
> Jian-Hong Pan
> 

512 RX buffers is not necessary I think. But I haven't had a chance to
test if reduce the number of RX SKBs could affect the latency.
I can run some throughput tests and then decide a minimum numbers
that RX ring requires. Or if you can try it.

Thanks.
Yan-Hsuan