Re: blktests failures with v6.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/13/23 09:48, Sagi Grimberg wrote:

#3: nvme/003 (fabrics transport)

     When nvme test group is run with trtype=rdma or tcp, the test case fails      due to lockdep WARNING "possible circular locking dependency detected".      Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
     needs more discussion.

     [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@xxxxxxx/

This patch is unfortunately incorrect and buggy.

This will likely make the issue go away, but adds another
old issue where a client can DDOS a target by bombarding it
with connect/disconnect. When releases are async and we don't
have any back-pressure, it is likely to happen.
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 4597bca43a6d..8b4f4aa48206 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id
*cm_id,
                 goto put_device;
         }

-       if (queue->host_qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
-       }
-
         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
         if (ret) {
                 /*
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 868aa4de2e4c..c8cfa19e11c7 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq
*sq)
         struct nvmet_tcp_queue *queue =
                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);

-       if (sq->qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
-       }
-
         queue->nr_cmds = sq->size * 2;
         if (nvmet_tcp_alloc_cmds(queue))
                 return NVME_SC_INTERNAL;
--

Thanks Sagi, I tried the patch above and confirmed the lockdep WARN disappears
for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
introduced the circular lock dependency.

Thanks for confirming. This was expected.

I also found the two commits below
record why the flush_workqueue(nvmet_wq) was introduced.

  777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")   8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")

The second patch is unrelated, before we used a global workqueue and
fundamentally had the same issue.

The left question is how to avoid both the connect/disconnect bombarding DDOS
and the circular lock possibility related to the nvmet_wq completion.

I don't see any way to synchronize connects with releases without moving connect sequences to a dedicated thread. Which in my mind is undesirable.

The only solution I can think of is to fail a host connect expecting the
host to reconnect and throttle this way, but that would lead to spurious
connect failures (at least from the host PoV).

Maybe we can add a NOT_READY connect error code in nvme for that...

You know, I have been seeing the very same lockdep warning during TLS testing; wasn't sure, though, if it's a generic issue or something I've introduced with the TLS logic.

I guess we can solve this by adding another NVMET_TCP_Q_INIT state before NVMET_TCP_Q_CONNECTING; then we can flip the state from INIT to CONNECTING in nvmet_tcp_alloc_queue():

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ed98df72c76b..e6f699a44128 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -121,6 +121,7 @@ struct nvmet_tcp_cmd {
 };

 enum nvmet_tcp_queue_state {
+       NVMET_TCP_Q_INIT,
        NVMET_TCP_Q_CONNECTING,
        NVMET_TCP_Q_LIVE,
        NVMET_TCP_Q_DISCONNECTING,
@@ -1274,7 +1275,8 @@ static int nvmet_tcp_try_recv(struct nvmet_tcp_queue *queue, static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
 {
        spin_lock(&queue->state_lock);
-       if (queue->state != NVMET_TCP_Q_DISCONNECTING) {
+       if (queue->state != NVMET_TCP_Q_INIT &&
+           queue->state != NVMET_TCP_Q_DISCONNECTING) {
                queue->state = NVMET_TCP_Q_DISCONNECTING;
                queue_work(nvmet_wq, &queue->release_work);
        }
@@ -1625,7 +1627,7 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
        queue->port = port;
        queue->nr_cmds = 0;
        spin_lock_init(&queue->state_lock);
-       queue->state = NVMET_TCP_Q_CONNECTING;
+       queue->state = NVMET_TCP_Q_INIT;
        INIT_LIST_HEAD(&queue->free_list);
        init_llist_head(&queue->resp_list);
        INIT_LIST_HEAD(&queue->resp_send_list);
@@ -1832,10 +1834,12 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq *sq)
        struct nvmet_tcp_queue *queue =
                container_of(sq, struct nvmet_tcp_queue, nvme_sq);

-       if (sq->qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
+       spin_lock(&queue->state_lock);
+       if (queue->state != NVMET_TCP_Q_INIT) {
+               spin_unlock(&queue->state_lock);
+               return NVME_SC_NOT_READY;
        }
+       spin_unlock(&queue->state_lock);

        queue->nr_cmds = sq->size * 2;
        if (nvmet_tcp_alloc_cmds(queue))

With that we'll return 'not ready' whenever we hit this condition, but that should be fine as we would've crashed anyway with the old code.

Hmm?

Cheers,

Hannes



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux