On Fri, Sep 22, 2023 at 06:34:16PM +0200, Mikulas Patocka wrote: > On Fri, 22 Sep 2023, Giovanni Cabiddu wrote: > > Hi Mikulas, > > > > many thanks for reporting this issue and finding a solution. > > > > On Thu, Sep 21, 2023 at 10:53:55PM +0200, Mikulas Patocka wrote: > > > I was evaluating whether it is feasible to use QAT with dm-crypt (the > > > answer is that it is not - QAT is slower than AES-NI for this type of > > > workload; QAT starts to be benefical for encryption requests longer than > > > 64k). > > Correct. Is there anything that we can do to batch requests in a single > > call? > > Ask Herbert Xu. I think it would complicate the design of crypto API. > > > Sometime ago there was some work done to build a geniv template cipher > > and optimize dm-crypt to encrypt larger block sizes in a single call, > > see [1][2]. Don't know if that work was completed. > > > > >And I got some deadlocks. > > Ouch! > > > > > The reason for the deadlocks is this: suppose that one of the "if" > > > conditions in "qat_alg_send_message_maybacklog" is true and we jump to the > > > "enqueue" label. At this point, an interrupt comes in and clears all > > > pending messages. Now, the interrupt returns, we grab backlog->lock, add > > > the message to the backlog, drop backlog->lock - and there is no one to > > > remove the backlogged message out of the list and submit it. > > Makes sense. In my testing I wasn't able to reproduce this condition. > > I reproduced it with this: > Use a system with two Intel(R) Xeon(R) Gold 5420+ processors > Use a kernel 6.6-rc2 > Patch the kernel, so that dm-crypt uses QAT - that is, in > drivers/md/dm-crypt.c, replace all strings > "CRYPTO_ALG_ALLOCATES_MEMORY" with "0" > Use .config from RHEL-9.4 beta and compile the kernel > On the system, disable hyperthreading with > "echo off >/sys/devices/system/cpu/smt/control" > Activate dm-crypt on the top of nvme: > "cryptsetup create cr /dev/nvme3n1 --sector-size=4096" > Run fio in a loop: > "while true; do > fio --ioengine=psync --iodepth=1 --rw=randwrite --direct=1 > --end_fsync=1 --bs=64k --numjobs=56 --time_based > --runtime=10 --group_reporting --name=job > --filename=/dev/mapper/cr > done" > > With this setup, I get a deadlock in a few iterations of fio. > > > > I fixed it with this patch - with this patch, the test passes and there > > > are no longer any deadlocks. I didn't want to add a spinlock to the hot > > > path, so I take it only if some of the condition suggests that queuing may > > > be required. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx > > The commit message requires a bit of rework to describe the change. > > I improved the message and I send a second version of the patch. > > > Also, deserves a fixes tag. > > "Fixes" tag is for something that worked and that was broken in some > previous commit. That's right. > A quick search through git shows that QAT backlogging was > broken since the introduction of QAT. The driver was moved from drivers/crypto/qat to drivers/crypto/intel/qat that's why you see a single patch. This fixes 386823839732 ("crypto: qat - add backlog mechanism") > > > > > > > --- > > > drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 31 ++++++++++++-------- > > > 1 file changed, 20 insertions(+), 11 deletions(-) > > > > > > Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > > > =================================================================== > > > --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > > > +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > > > @@ -40,16 +40,6 @@ void qat_alg_send_backlog(struct qat_ins > > > spin_unlock_bh(&backlog->lock); > > > } > > > > > > -static void qat_alg_backlog_req(struct qat_alg_req *req, > > > - struct qat_instance_backlog *backlog) > > > -{ > > > - INIT_LIST_HEAD(&req->list); > > Is the initialization of an element no longer needed? > > It was never needed. list_add_tail calls __list_add and __list_add > overwrites new->next and new->prev without reading them. So, there's no > need to initialize them. > > > > - > > > - spin_lock_bh(&backlog->lock); > > > - list_add_tail(&req->list, &backlog->list); > > > - spin_unlock_bh(&backlog->lock); > > > -} > > > - > > > static int qat_alg_send_message_maybacklog(struct qat_alg_req *req) > > > { > > > struct qat_instance_backlog *backlog = req->backlog; > > > @@ -71,8 +61,27 @@ static int qat_alg_send_message_maybackl > > > return -EINPROGRESS; > > > > > > enqueue: > > > - qat_alg_backlog_req(req, backlog); > > > + spin_lock_bh(&backlog->lock); > > > + > > > + /* If any request is already backlogged, then add to backlog list */ > > > + if (!list_empty(&backlog->list)) > > > + goto enqueue2; > > > > > > + /* If ring is nearly full, then add to backlog list */ > > > + if (adf_ring_nearly_full(tx_ring)) > > > + goto enqueue2; > > > + > > > + /* If adding request to HW ring fails, then add to backlog list */ > > > + if (adf_send_message(tx_ring, fw_req)) > > > + goto enqueue2; > > In a nutshell, you are re-doing the same steps taking the backlog lock. > > > > It should be possible to re-write it so that there is a function that > > attempts enqueuing and if it fails, then the same is called again taking > > the lock. > > If you want I can rework it and resubmit. > > Yes, if you prefer it this way, I reworked the patch so that we execute > the same code with or without the spinlock held. > > > > + > > > + spin_unlock_bh(&backlog->lock); > > > + return -EINPROGRESS; > > > + > > > +enqueue2: > > > + list_add_tail(&req->list, &backlog->list); > > > + > > > + spin_unlock_bh(&backlog->lock); > > > return -EBUSY; > > > } > > > > [1] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1276510.html > > [2] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1428293.html > > > > Regards, > > > > -- > > Giovanni > > > > From: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Subject: [PATCH] qat: fix deadlock in backlog processing crypto: qat - fix ... > > I was testing QAT with dm-crypt and I got some deadlocks. > > The reason for the deadlocks is this: suppose that one of the "if" > conditions in "qat_alg_send_message_maybacklog" is true and we jump to the > "enqueue" label. At this point, an interrupt comes in and clears all > pending messages. Now, the interrupt returns, we grab backlog->lock, add > the message to the backlog, drop backlog->lock - and there is no one to > remove the backlogged message out of the list and submit it. > > In order to fix the bug, we must hold the spinlock backlog->lock when we > perform test for free space in the ring - so that the test for free space > and adding the request to a backlog is atomic and can't be interrupted by > an interrupt. Every completion interrupt calls qat_alg_send_backlog which > grabs backlog->lock, so holding this spinlock is sufficient to synchronize > with interrupts. > > I didn't want to add a spinlock unconditionally to the hot path for > performance reasons, so I take it only if some of the condition suggests > that queuing may be required. > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > > --- > drivers/crypto/intel/qat/qat_common/qat_algs_send.c | 23 ++++++++++---------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > =================================================================== > --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c > @@ -40,22 +40,14 @@ void qat_alg_send_backlog(struct qat_ins > spin_unlock_bh(&backlog->lock); > } > > -static void qat_alg_backlog_req(struct qat_alg_req *req, > - struct qat_instance_backlog *backlog) > -{ > - INIT_LIST_HEAD(&req->list); > - > - spin_lock_bh(&backlog->lock); > - list_add_tail(&req->list, &backlog->list); > - spin_unlock_bh(&backlog->lock); > -} > - > static int qat_alg_send_message_maybacklog(struct qat_alg_req *req) > { > struct qat_instance_backlog *backlog = req->backlog; > struct adf_etr_ring_data *tx_ring = req->tx_ring; > u32 *fw_req = req->fw_req; > + bool locked = false; > > +repeat: > /* If any request is already backlogged, then add to backlog list */ > if (!list_empty(&backlog->list)) > goto enqueue; > @@ -68,11 +60,20 @@ static int qat_alg_send_message_maybackl > if (adf_send_message(tx_ring, fw_req)) > goto enqueue; > > + if (unlikely(locked)) > + spin_unlock_bh(&backlog->lock); > return -EINPROGRESS; > > enqueue: > - qat_alg_backlog_req(req, backlog); > + if (!locked) { > + spin_lock_bh(&backlog->lock); > + locked = true; > + goto repeat; > + } > + > + list_add_tail(&req->list, &backlog->list); > > + spin_unlock_bh(&backlog->lock); > return -EBUSY; > } > > Thanks - when I proposed the rework I was thinking at a solution without gotos. Here is a draft: ------------8<---------------- .../intel/qat/qat_common/qat_algs_send.c | 40 ++++++++++--------- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c index bb80455b3e81..18c6a233ab96 100644 --- a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c +++ b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c @@ -40,17 +40,7 @@ void qat_alg_send_backlog(struct qat_instance_backlog *backlog) spin_unlock_bh(&backlog->lock); } -static void qat_alg_backlog_req(struct qat_alg_req *req, - struct qat_instance_backlog *backlog) -{ - INIT_LIST_HEAD(&req->list); - - spin_lock_bh(&backlog->lock); - list_add_tail(&req->list, &backlog->list); - spin_unlock_bh(&backlog->lock); -} - -static int qat_alg_send_message_maybacklog(struct qat_alg_req *req) +static bool qat_alg_try_enqueue(struct qat_alg_req *req) { struct qat_instance_backlog *backlog = req->backlog; struct adf_etr_ring_data *tx_ring = req->tx_ring; @@ -58,22 +48,36 @@ static int qat_alg_send_message_maybacklog(struct qat_alg_req *req) /* If any request is already backlogged, then add to backlog list */ if (!list_empty(&backlog->list)) - goto enqueue; + return false; /* If ring is nearly full, then add to backlog list */ if (adf_ring_nearly_full(tx_ring)) - goto enqueue; + return false; /* If adding request to HW ring fails, then add to backlog list */ if (adf_send_message(tx_ring, fw_req)) - goto enqueue; + return false; - return -EINPROGRESS; + return true; +} -enqueue: - qat_alg_backlog_req(req, backlog); - return -EBUSY; +static int qat_alg_send_message_maybacklog(struct qat_alg_req *req) +{ + struct qat_instance_backlog *backlog = req->backlog; + int ret = -EINPROGRESS; + + if (qat_alg_try_enqueue(req)) + return ret; + + spin_lock_bh(&backlog->lock); + if (!qat_alg_try_enqueue(req)) { + list_add_tail(&req->list, &backlog->list); + ret = -EBUSY; + } + spin_unlock_bh(&backlog->lock); + + return ret; } int qat_alg_send_message(struct qat_alg_req *req) -- 2.41.0