On Fri, 2022-03-11 at 12:57 -0500, David Jeffery wrote: > With fast infiniband networks and rdma through isert, the isert > version of > an iSCSI target can get itself into a deadlock condition from when > max_cmd_sn updates are pushed to the client versus when commands are > fully > released after rdma completes. > > iscsit preallocates a limited number of iscsi_cmd structs used for > any > commands from the initiator. While the iscsi window would normally be > expected to limit the number used by normal SCSI commands, isert can > exceed > this limit with commands waiting finally completion. max_cmd_sn gets > incremented and pushed to the client on sending the target's final > response, but the iscsi_cmd won't be freed for reuse until after all > rdma > is acknowledged as complete. > > This allows more new commands to come in even as older commands are > not yet > released. With enough commands on the initiator wanting to be sent, > this can > result in all iscsi_cmd structs being allocated and used for SCSI > commands. > > And once all are allocated, isert can deadlock when another new > command is > received. Its receive processing waits for an iscsi_cmd to become > available. > But this also stalls processing of the completions which would result > in > releasing an iscsi_cmd, resulting in a deadlock. > > This small patch series prevents this issue by altering when and how > max_cmd_sn changes are reported to the initiator for isert. It gets > delayed > until iscsi_cmd release instead of when sending a final response. > > To prevent failure or large delays for informing the initiator of > changes to > max_cmd_sn, NOPIN is used as a method to inform the initiator should > the > difference between internal max_cmd_sn and what has been passed to > the > initiator grow too large. > > David Jeffery (2): > isert: support for unsolicited NOPIN with no response. > iscsit: increment max_cmd_sn for isert on command release > > drivers/infiniband/ulp/isert/ib_isert.c | 11 ++++++- > drivers/target/iscsi/iscsi_target.c | 18 +++++------ > drivers/target/iscsi/iscsi_target_device.c | 35 > +++++++++++++++++++++- > drivers/target/iscsi/iscsi_target_login.c | 1 + > drivers/target/iscsi/iscsi_target_util.c | 5 +++- > drivers/target/iscsi/iscsi_target_util.h | 1 + > include/target/iscsi/iscsi_target_core.h | 8 +++++ > include/target/iscsi/iscsi_transport.h | 1 + > 8 files changed, 68 insertions(+), 12 deletions(-) > This patch has had exhaustive testing in our lab and finally at a customer. with 40GB FDR we could not reproduce this issue, when we moved to 100G EDR it showed up. Its been literally over tested for many days on two separate installations. The patch corrected all the stalls and problems seen. Thanks David for sending this. Regards Laurence Oberman