Re: [PATCH 4/4] crypto: hisilicon/sec2 - Add pbuffer mode for SEC driver

Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> · Wed, 26 Feb 2020 14:30:37 +0000

On Wed, 26 Feb 2020 19:18:51 +0800
Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote:

> Hi,
> On 2020/2/25 23:14, Jonathan Cameron wrote:
> > On Tue, 25 Feb 2020 11:16:52 +0800
> > Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote:
> >  
> >> Hi,
> >>
> >>
> >> On 2020/2/24 22:01, Jonathan Cameron wrote:  
> >>> On Thu, 20 Feb 2020 17:04:55 +0800
> >>> Zaibo Xu <xuzaibo@xxxxxxxxxx> wrote:
> >>>   
> >>>  
> [...]
> >>>>    
> >>>> +static void sec_free_pbuf_resource(struct device *dev, struct sec_alg_res *res)
> >>>> +{
> >>>> +	if (res->pbuf)
> >>>> +		dma_free_coherent(dev, SEC_TOTAL_PBUF_SZ,
> >>>> +				  res->pbuf, res->pbuf_dma);
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * To improve performance, pbuffer is used for
> >>>> + * small packets (< 576Bytes) as IOMMU translation using.
> >>>> + */
> >>>> +static int sec_alloc_pbuf_resource(struct device *dev, struct sec_alg_res *res)
> >>>> +{
> >>>> +	int pbuf_page_offset;
> >>>> +	int i, j, k;
> >>>> +
> >>>> +	res->pbuf = dma_alloc_coherent(dev, SEC_TOTAL_PBUF_SZ,
> >>>> +				&res->pbuf_dma, GFP_KERNEL);  
> >>> Would it make more sense perhaps to do this as a DMA pool and have
> >>> it expand on demand?  
> >> Since there exist all kinds of buffer length, I think dma_alloc_coherent
> >> may be better?  
> > As it currently stands we allocate a large buffer in one go but ensure
> > we only have a single dma map that occurs at startup.
> >
> > If we allocate every time (don't use pbuf) performance is hit by
> > the need to set up the page table entries and flush for every request.
> >
> > A dma pool with a fixed size element would at worst (for small messages)
> > mean you had to do a dma map / unmap every time 6 ish buffers.
> > This would only happen if you filled the whole queue.  Under normal operation
> > you will have a fairly steady number of buffers in use at a time, so mostly
> > it would be reusing buffers that were already mapped from a previous request.  
> Agree, dma pool may give a smaller range of mapped memory, which may 
> increase hits
> of IOMMU TLB.
> >
> > You could implement your own allocator on top of dma_alloc_coherent but it'll
> > probably be a messy and cost you more than using fixed size small elements.
> >
> > So a dmapool here would give you a mid point between using lots of memory
> > and never needing to map/unmap vs map/unmap every time.
> >  
> My concern is the spinlock of DMA pool, which adds an exclusion between 
> sending requests
> and receiving responses, since DMA blocks are allocated as sending and 
> freed at receiving.

Agreed.  That may be a bottleneck.  Not clear to me whether that would be a
significant issue or not.

Jonathan

> 
> Thanks,
> Zaibo
> 
> .
> >>>     
> >>>> +	if (!res->pbuf)
> >>>> +		return -ENOMEM;
> >>>> +
> >>>> +	/*
> >>>> +	 * SEC_PBUF_PKG contains data pbuf, iv and
> >>>> +	 * out_mac : <SEC_PBUF|SEC_IV|SEC_MAC>
> >>>> +	 * Every PAGE contains six SEC_PBUF_PKG
> >>>> +	 * The sec_qp_ctx contains QM_Q_DEPTH numbers of SEC_PBUF_PKG
> >>>> +	 * So we need SEC_PBUF_PAGE_NUM numbers of PAGE
> >>>> +	 * for the SEC_TOTAL_PBUF_SZ
> >>>> +	 */
> >>>> +	for (i = 0; i <= SEC_PBUF_PAGE_NUM; i++) {
> >>>> +		pbuf_page_offset = PAGE_SIZE * i;
> >>>> +		for (j = 0; j < SEC_PBUF_NUM; j++) {
> >>>> +			k = i * SEC_PBUF_NUM + j;
> >>>> +			if (k == QM_Q_DEPTH)
> >>>> +				break;
> >>>> +			res[k].pbuf = res->pbuf +
> >>>> +				j * SEC_PBUF_PKG + pbuf_page_offset;
> >>>> +			res[k].pbuf_dma = res->pbuf_dma +
> >>>> +				j * SEC_PBUF_PKG + pbuf_page_offset;
> >>>> +		}
> >>>> +	}
> >>>> +	return 0;
> >>>> +}
> >>>> +  
> [...]
>