Re: [PATCH 4/4] crypto: hisilicon/sec2 - Add pbuffer mode for SEC driver

Xu Zaibo <xuzaibo@xxxxxxxxxx> · Thu, 27 Feb 2020 09:13:06 +0800

Hi,
On 2020/2/26 22:30, Jonathan Cameron wrote:
On Wed, 26 Feb 2020 19:18:51 +0800
Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote:

Hi,
On 2020/2/25 23:14, Jonathan Cameron wrote:
On Tue, 25 Feb 2020 11:16:52 +0800
Xu Zaibo <xuzaibo@xxxxxxxxxx> wrote:

Hi,

On 2020/2/24 22:01, Jonathan Cameron wrote:
On Thu, 20 Feb 2020 17:04:55 +0800
Zaibo Xu <xuzaibo@xxxxxxxxxx> wrote:

[...]

+static void sec_free_pbuf_resource(struct device *dev, struct sec_alg_res *res)
+{
+	if (res->pbuf)
+		dma_free_coherent(dev, SEC_TOTAL_PBUF_SZ,
+				  res->pbuf, res->pbuf_dma);
+}
+
+/*
+ * To improve performance, pbuffer is used for
+ * small packets (< 576Bytes) as IOMMU translation using.
+ */
+static int sec_alloc_pbuf_resource(struct device *dev, struct sec_alg_res *res)
+{
+	int pbuf_page_offset;
+	int i, j, k;
+
+	res->pbuf = dma_alloc_coherent(dev, SEC_TOTAL_PBUF_SZ,
+				&res->pbuf_dma, GFP_KERNEL);
Would it make more sense perhaps to do this as a DMA pool and have
it expand on demand?
Since there exist all kinds of buffer length, I think dma_alloc_coherent
may be better?
As it currently stands we allocate a large buffer in one go but ensure
we only have a single dma map that occurs at startup.

If we allocate every time (don't use pbuf) performance is hit by
the need to set up the page table entries and flush for every request.

A dma pool with a fixed size element would at worst (for small messages)
mean you had to do a dma map / unmap every time 6 ish buffers.
This would only happen if you filled the whole queue.  Under normal operation
you will have a fairly steady number of buffers in use at a time, so mostly
it would be reusing buffers that were already mapped from a previous request.
Agree, dma pool may give a smaller range of mapped memory, which may
increase hits
of IOMMU TLB.
You could implement your own allocator on top of dma_alloc_coherent but it'll
probably be a messy and cost you more than using fixed size small elements.

So a dmapool here would give you a mid point between using lots of memory
and never needing to map/unmap vs map/unmap every time.

My concern is the spinlock of DMA pool, which adds an exclusion between
sending requests
and receiving responses, since DMA blocks are allocated as sending and
freed at receiving.
Agreed.  That may be a bottleneck.  Not clear to me whether that would be a
significant issue or not.

Anyway, we will test the performance of DMA pool to get a better solution.

Thanks,
Zaibo

.

Thanks,
Zaibo

.

+	if (!res->pbuf)
+		return -ENOMEM;
+
+	/*
+	 * SEC_PBUF_PKG contains data pbuf, iv and
+	 * out_mac : <SEC_PBUF|SEC_IV|SEC_MAC>
+	 * Every PAGE contains six SEC_PBUF_PKG
+	 * The sec_qp_ctx contains QM_Q_DEPTH numbers of SEC_PBUF_PKG
+	 * So we need SEC_PBUF_PAGE_NUM numbers of PAGE
+	 * for the SEC_TOTAL_PBUF_SZ
+	 */
+	for (i = 0; i <= SEC_PBUF_PAGE_NUM; i++) {
+		pbuf_page_offset = PAGE_SIZE * i;
+		for (j = 0; j < SEC_PBUF_NUM; j++) {
+			k = i * SEC_PBUF_NUM + j;
+			if (k == QM_Q_DEPTH)
+				break;
+			res[k].pbuf = res->pbuf +
+				j * SEC_PBUF_PKG + pbuf_page_offset;
+			res[k].pbuf_dma = res->pbuf_dma +
+				j * SEC_PBUF_PKG + pbuf_page_offset;
+		}
+	}
+	return 0;
+}
+
[...]

.