Re: [PATCH v2] qat: fix deadlock in backlog processing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 22, 2023 at 06:34:16PM +0200, Mikulas Patocka wrote:
> On Fri, 22 Sep 2023, Giovanni Cabiddu wrote:
> > Hi Mikulas,
> > 
> > many thanks for reporting this issue and finding a solution.
> > 
> > On Thu, Sep 21, 2023 at 10:53:55PM +0200, Mikulas Patocka wrote:
> > > I was evaluating whether it is feasible to use QAT with dm-crypt (the 
> > > answer is that it is not - QAT is slower than AES-NI for this type of 
> > > workload; QAT starts to be benefical for encryption requests longer than 
> > > 64k).
> > Correct. Is there anything that we can do to batch requests in a single
> > call?
> 
> Ask Herbert Xu. I think it would complicate the design of crypto API.
> 
> > Sometime ago there was some work done to build a geniv template cipher
> > and optimize dm-crypt to encrypt larger block sizes in a single call,
> > see [1][2]. Don't know if that work was completed.
> >
> > >And I got some deadlocks.
> > Ouch!
> > 
> > > The reason for the deadlocks is this: suppose that one of the "if"
> > > conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
> > > "enqueue" label. At this point, an interrupt comes in and clears all
> > > pending messages. Now, the interrupt returns, we grab backlog->lock, add
> > > the message to the backlog, drop backlog->lock - and there is no one to
> > > remove the backlogged message out of the list and submit it.
> > Makes sense. In my testing I wasn't able to reproduce this condition.
> 
> I reproduced it with this:
> Use a system with two Intel(R) Xeon(R) Gold 5420+ processors
> Use a kernel 6.6-rc2
> Patch the kernel, so that dm-crypt uses QAT - that is, in 
> 	drivers/md/dm-crypt.c, replace all strings 
> 	"CRYPTO_ALG_ALLOCATES_MEMORY" with "0"
> Use .config from RHEL-9.4 beta and compile the kernel
> On the system, disable hyperthreading with
> 	"echo off >/sys/devices/system/cpu/smt/control"
> Activate dm-crypt on the top of nvme:
> 	"cryptsetup create cr /dev/nvme3n1 --sector-size=4096"
> Run fio in a loop:
> 	"while true; do
> 		fio --ioengine=psync --iodepth=1 --rw=randwrite --direct=1 
> 		--end_fsync=1 --bs=64k --numjobs=56 --time_based 
> 		--runtime=10 --group_reporting --name=job 
> 		--filename=/dev/mapper/cr
> 	done"
> 
> With this setup, I get a deadlock in a few iterations of fio.
> 
> > > I fixed it with this patch - with this patch, the test passes and there
> > > are no longer any deadlocks. I didn't want to add a spinlock to the hot
> > > path, so I take it only if some of the condition suggests that queuing may
> > > be required.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > > Cc: stable@xxxxxxxxxxxxxxx
> > The commit message requires a bit of rework to describe the change.
> 
> I improved the message and I send a second version of the patch.
> 
> > Also, deserves a fixes tag.
> 
> "Fixes" tag is for something that worked and that was broken in some 
> previous commit.
That's right.

> A quick search through git shows that QAT backlogging was 
> broken since the introduction of QAT.
The driver was moved from drivers/crypto/qat to drivers/crypto/intel/qat
that's why you see a single patch.
This fixes 386823839732 ("crypto: qat - add backlog mechanism")

> 
> > > 
> > > ---
> > >  drivers/crypto/intel/qat/qat_common/qat_algs_send.c |   31 ++++++++++++--------
> > >  1 file changed, 20 insertions(+), 11 deletions(-)
> > > 
> > > Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> > > @@ -40,16 +40,6 @@ void qat_alg_send_backlog(struct qat_ins
> > >  	spin_unlock_bh(&backlog->lock);
> > >  }
> > >  
> > > -static void qat_alg_backlog_req(struct qat_alg_req *req,
> > > -				struct qat_instance_backlog *backlog)
> > > -{
> > > -	INIT_LIST_HEAD(&req->list);
> > Is the initialization of an element no longer needed?
> 
> It was never needed. list_add_tail calls __list_add and __list_add 
> overwrites new->next and new->prev without reading them. So, there's no 
> need to initialize them.
> 
> > > -
> > > -	spin_lock_bh(&backlog->lock);
> > > -	list_add_tail(&req->list, &backlog->list);
> > > -	spin_unlock_bh(&backlog->lock);
> > > -}
> > > -
> > >  static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
> > >  {
> > >  	struct qat_instance_backlog *backlog = req->backlog;
> > > @@ -71,8 +61,27 @@ static int qat_alg_send_message_maybackl
> > >  	return -EINPROGRESS;
> > >  
> > >  enqueue:
> > > -	qat_alg_backlog_req(req, backlog);
> > > +	spin_lock_bh(&backlog->lock);
> > > +
> > > +	/* If any request is already backlogged, then add to backlog list */
> > > +	if (!list_empty(&backlog->list))
> > > +		goto enqueue2;
> > >  
> > > +	/* If ring is nearly full, then add to backlog list */
> > > +	if (adf_ring_nearly_full(tx_ring))
> > > +		goto enqueue2;
> > > +
> > > +	/* If adding request to HW ring fails, then add to backlog list */
> > > +	if (adf_send_message(tx_ring, fw_req))
> > > +		goto enqueue2;
> > In a nutshell, you are re-doing the same steps taking the backlog lock.
> > 
> > It should be possible to re-write it so that there is a function that
> > attempts enqueuing and if it fails, then the same is called again taking
> > the lock.
> > If you want I can rework it and resubmit.
> 
> Yes, if you prefer it this way, I reworked the patch so that we execute 
> the same code with or without the spinlock held.
> 
> > > +
> > > +	spin_unlock_bh(&backlog->lock);
> > > +	return -EINPROGRESS;
> > > +
> > > +enqueue2:
> > > +	list_add_tail(&req->list, &backlog->list);
> > > +
> > > +	spin_unlock_bh(&backlog->lock);
> > >  	return -EBUSY;
> > >  }
> > 
> > [1] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1276510.html
> > [2] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1428293.html
> > 
> > Regards,
> > 
> > -- 
> > Giovanni
> > 
> 
> From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Subject: [PATCH] qat: fix deadlock in backlog processing
crypto: qat - fix ...
> 
> I was testing QAT with dm-crypt and I got some deadlocks.
> 
> The reason for the deadlocks is this: suppose that one of the "if"
> conditions in "qat_alg_send_message_maybacklog" is true and we jump to the
> "enqueue" label. At this point, an interrupt comes in and clears all
> pending messages. Now, the interrupt returns, we grab backlog->lock, add
> the message to the backlog, drop backlog->lock - and there is no one to
> remove the backlogged message out of the list and submit it.
> 
> In order to fix the bug, we must hold the spinlock backlog->lock when we 
> perform test for free space in the ring - so that the test for free space 
> and adding the request to a backlog is atomic and can't be interrupted by 
> an interrupt. Every completion interrupt calls qat_alg_send_backlog which 
> grabs backlog->lock, so holding this spinlock is sufficient to synchronize 
> with interrupts.
> 
> I didn't want to add a spinlock unconditionally to the hot path for 
> performance reasons, so I take it only if some of the condition suggests 
> that queuing may be required.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> 
> ---
>  drivers/crypto/intel/qat/qat_common/qat_algs_send.c |   23 ++++++++++----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> ===================================================================
> --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
> @@ -40,22 +40,14 @@ void qat_alg_send_backlog(struct qat_ins
>  	spin_unlock_bh(&backlog->lock);
>  }
>  
> -static void qat_alg_backlog_req(struct qat_alg_req *req,
> -				struct qat_instance_backlog *backlog)
> -{
> -	INIT_LIST_HEAD(&req->list);
> -
> -	spin_lock_bh(&backlog->lock);
> -	list_add_tail(&req->list, &backlog->list);
> -	spin_unlock_bh(&backlog->lock);
> -}
> -
>  static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
>  {
>  	struct qat_instance_backlog *backlog = req->backlog;
>  	struct adf_etr_ring_data *tx_ring = req->tx_ring;
>  	u32 *fw_req = req->fw_req;
> +	bool locked = false;
>  
> +repeat:
>  	/* If any request is already backlogged, then add to backlog list */
>  	if (!list_empty(&backlog->list))
>  		goto enqueue;
> @@ -68,11 +60,20 @@ static int qat_alg_send_message_maybackl
>  	if (adf_send_message(tx_ring, fw_req))
>  		goto enqueue;
>  
> +	if (unlikely(locked))
> +		spin_unlock_bh(&backlog->lock);
>  	return -EINPROGRESS;
>  
>  enqueue:
> -	qat_alg_backlog_req(req, backlog);
> +	if (!locked) {
> +		spin_lock_bh(&backlog->lock);
> +		locked = true;
> +		goto repeat;
> +	}
> +
> +	list_add_tail(&req->list, &backlog->list);
>  
> +	spin_unlock_bh(&backlog->lock);
>  	return -EBUSY;
>  }
>  
> 
Thanks - when I proposed the rework I was thinking at a solution without
gotos. Here is a draft:
------------8<----------------
 .../intel/qat/qat_common/qat_algs_send.c      | 40 ++++++++++---------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
index bb80455b3e81..18c6a233ab96 100644
--- a/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
+++ b/drivers/crypto/intel/qat/qat_common/qat_algs_send.c
@@ -40,17 +40,7 @@ void qat_alg_send_backlog(struct qat_instance_backlog *backlog)
 	spin_unlock_bh(&backlog->lock);
 }
 
-static void qat_alg_backlog_req(struct qat_alg_req *req,
-				struct qat_instance_backlog *backlog)
-{
-	INIT_LIST_HEAD(&req->list);
-
-	spin_lock_bh(&backlog->lock);
-	list_add_tail(&req->list, &backlog->list);
-	spin_unlock_bh(&backlog->lock);
-}
-
-static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
+static bool qat_alg_try_enqueue(struct qat_alg_req *req)
 {
 	struct qat_instance_backlog *backlog = req->backlog;
 	struct adf_etr_ring_data *tx_ring = req->tx_ring;
@@ -58,22 +48,36 @@ static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
 
 	/* If any request is already backlogged, then add to backlog list */
 	if (!list_empty(&backlog->list))
-		goto enqueue;
+		return false;
 
 	/* If ring is nearly full, then add to backlog list */
 	if (adf_ring_nearly_full(tx_ring))
-		goto enqueue;
+		return false;
 
 	/* If adding request to HW ring fails, then add to backlog list */
 	if (adf_send_message(tx_ring, fw_req))
-		goto enqueue;
+		return false;
 
-	return -EINPROGRESS;
+	return true;
+}
 
-enqueue:
-	qat_alg_backlog_req(req, backlog);
 
-	return -EBUSY;
+static int qat_alg_send_message_maybacklog(struct qat_alg_req *req)
+{
+	struct qat_instance_backlog *backlog = req->backlog;
+	int ret = -EINPROGRESS;
+
+	if (qat_alg_try_enqueue(req))
+		return ret;
+
+	spin_lock_bh(&backlog->lock);
+	if (!qat_alg_try_enqueue(req)) {
+		list_add_tail(&req->list, &backlog->list);
+		ret = -EBUSY;
+	}
+	spin_unlock_bh(&backlog->lock);
+
+	return ret;
 }
 
 int qat_alg_send_message(struct qat_alg_req *req)
-- 
2.41.0





[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux