On Jul 09, 2023 / 17:32, Sagi Grimberg wrote: > > > #3: nvme/003 (fabrics transport) > > > > When nvme test group is run with trtype=rdma or tcp, the test case fails > > due to lockdep WARNING "possible circular locking dependency detected". > > Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it > > needs more discussion. > > > > [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@xxxxxxx/ > > This patch is unfortunately incorrect and buggy. > > This will likely make the issue go away, but adds another > old issue where a client can DDOS a target by bombarding it > with connect/disconnect. When releases are async and we don't > have any back-pressure, it is likely to happen. > -- > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index 4597bca43a6d..8b4f4aa48206 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id > *cm_id, > goto put_device; > } > > - if (queue->host_qid == 0) { > - /* Let inflight controller teardown complete */ > - flush_workqueue(nvmet_wq); > - } > - > ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); > if (ret) { > /* > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c > index 868aa4de2e4c..c8cfa19e11c7 100644 > --- a/drivers/nvme/target/tcp.c > +++ b/drivers/nvme/target/tcp.c > @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq > *sq) > struct nvmet_tcp_queue *queue = > container_of(sq, struct nvmet_tcp_queue, nvme_sq); > > - if (sq->qid == 0) { > - /* Let inflight controller teardown complete */ > - flush_workqueue(nvmet_wq); > - } > - > queue->nr_cmds = sq->size * 2; > if (nvmet_tcp_alloc_cmds(queue)) > return NVME_SC_INTERNAL; > -- Thanks Sagi, I tried the patch above and confirmed the lockdep WARN disappears for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq) introduced the circular lock dependency. I also found the two commits below record why the flush_workqueue(nvmet_wq) was introduced. 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown") 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue") The left question is how to avoid both the connect/disconnect bombarding DDOS and the circular lock possibility related to the nvmet_wq completion.