On Fri, 20 Oct 2023, Giovanni Cabiddu wrote: > On Mon, Oct 16, 2023 at 01:26:47PM +0200, Mikulas Patocka wrote: > > Hi > > > > I created this kernel module that stress-tests the crypto API: > > https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c > > > > It shows that QAT underperforms significantly compared to AES-NI (for > > large requests it is 10 times slower; for small requests it is even worse) > > - see the second table in this document: > > https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt > > > > QAT has higher priority than AES-NI, so the kernel prefers it (it is not > > used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", > > but it is preferred over AES-NI in other cases). > Probably you can get better performance by modifying your configuration > and test. > >From your test application I can infer that you are using a single QAT > device. The driver allocates a ring pair per TFM and it loads balances > allocations between devices. > In addition, jobs are submitted synchronously. This way the cost of > offload is not amortised between requests. > > Regards, > > -- > Giovanni Yes, I thought about using more TFMs, but I don't have access to the dual Xeon machine anymore (I would have to request access and wait until people who are using it release it). We can run more tests, but it would be best to batch all the tests in a small timeframe, to minimize blocking the machine for other people. Regarding synchronous submission - the current test submits 112 requests concurrently. Do you think that it is too small and we should submit more? What would be the appropriate number of requests to submit concurrently? I did synchronous submission because it was easier to write it than asynchronous submission, but if you think that 112 requests is too small and we need to submit more, I can try to do it. Mikulas