On Wed, 26 Feb 2020 19:18:51 +0800 Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote: > Hi, > On 2020/2/25 23:14, Jonathan Cameron wrote: > > On Tue, 25 Feb 2020 11:16:52 +0800 > > Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote: > > > >> Hi, > >> > >> > >> On 2020/2/24 22:01, Jonathan Cameron wrote: > >>> On Thu, 20 Feb 2020 17:04:55 +0800 > >>> Zaibo Xu <xuzaibo@xxxxxxxxxx> wrote: > >>> > >>> > [...] > >>>> > >>>> +static void sec_free_pbuf_resource(struct device *dev, struct sec_alg_res *res) > >>>> +{ > >>>> + if (res->pbuf) > >>>> + dma_free_coherent(dev, SEC_TOTAL_PBUF_SZ, > >>>> + res->pbuf, res->pbuf_dma); > >>>> +} > >>>> + > >>>> +/* > >>>> + * To improve performance, pbuffer is used for > >>>> + * small packets (< 576Bytes) as IOMMU translation using. > >>>> + */ > >>>> +static int sec_alloc_pbuf_resource(struct device *dev, struct sec_alg_res *res) > >>>> +{ > >>>> + int pbuf_page_offset; > >>>> + int i, j, k; > >>>> + > >>>> + res->pbuf = dma_alloc_coherent(dev, SEC_TOTAL_PBUF_SZ, > >>>> + &res->pbuf_dma, GFP_KERNEL); > >>> Would it make more sense perhaps to do this as a DMA pool and have > >>> it expand on demand? > >> Since there exist all kinds of buffer length, I think dma_alloc_coherent > >> may be better? > > As it currently stands we allocate a large buffer in one go but ensure > > we only have a single dma map that occurs at startup. > > > > If we allocate every time (don't use pbuf) performance is hit by > > the need to set up the page table entries and flush for every request. > > > > A dma pool with a fixed size element would at worst (for small messages) > > mean you had to do a dma map / unmap every time 6 ish buffers. > > This would only happen if you filled the whole queue. Under normal operation > > you will have a fairly steady number of buffers in use at a time, so mostly > > it would be reusing buffers that were already mapped from a previous request. > Agree, dma pool may give a smaller range of mapped memory, which may > increase hits > of IOMMU TLB. > > > > You could implement your own allocator on top of dma_alloc_coherent but it'll > > probably be a messy and cost you more than using fixed size small elements. > > > > So a dmapool here would give you a mid point between using lots of memory > > and never needing to map/unmap vs map/unmap every time. > > > My concern is the spinlock of DMA pool, which adds an exclusion between > sending requests > and receiving responses, since DMA blocks are allocated as sending and > freed at receiving. Agreed. That may be a bottleneck. Not clear to me whether that would be a significant issue or not. Jonathan > > Thanks, > Zaibo > > . > >>> > >>>> + if (!res->pbuf) > >>>> + return -ENOMEM; > >>>> + > >>>> + /* > >>>> + * SEC_PBUF_PKG contains data pbuf, iv and > >>>> + * out_mac : <SEC_PBUF|SEC_IV|SEC_MAC> > >>>> + * Every PAGE contains six SEC_PBUF_PKG > >>>> + * The sec_qp_ctx contains QM_Q_DEPTH numbers of SEC_PBUF_PKG > >>>> + * So we need SEC_PBUF_PAGE_NUM numbers of PAGE > >>>> + * for the SEC_TOTAL_PBUF_SZ > >>>> + */ > >>>> + for (i = 0; i <= SEC_PBUF_PAGE_NUM; i++) { > >>>> + pbuf_page_offset = PAGE_SIZE * i; > >>>> + for (j = 0; j < SEC_PBUF_NUM; j++) { > >>>> + k = i * SEC_PBUF_NUM + j; > >>>> + if (k == QM_Q_DEPTH) > >>>> + break; > >>>> + res[k].pbuf = res->pbuf + > >>>> + j * SEC_PBUF_PKG + pbuf_page_offset; > >>>> + res[k].pbuf_dma = res->pbuf_dma + > >>>> + j * SEC_PBUF_PKG + pbuf_page_offset; > >>>> + } > >>>> + } > >>>> + return 0; > >>>> +} > >>>> + > [...] >