Hi Kashyap Desai,
Unfortunately, there is a wider problem in iser that we do the local
invalidation if any, after we complete the iscsi task.
So the right solution is to do the logic we have in NVMe/RDMA that
checks if remote invalidation happened and if not, it does local
invalidation.
And the task is released after getting the send_completion and
local_invalidation/remote invalidation.
There is some infrastructure work needed to be done there.
Sagi,
Please remind me few things regarding the iSCSI bidir cmds.
In case of bidir IO cmd, if we need to use a reg_mr for it - I see a
potential problem there.
Is this scenario possible ?
I'm asking because if it is possible, so what happens in remote
invalidation ? what key is invalidated ? the read_key or the write_key ?
in ib_isert there is a priority to the read_key (is it by the spec ?).
We don't consider this in the iser recv completion and the remote
invalidation checker logic.
If it's not possible we should re-design few components in the code and
fix the issue that we do local_invalidation of cmd N during the send of
cmd N + 1.
Cheers,
-Max.
On 8/24/2022 3:42 PM, Kashyap Desai wrote:
In [0], Max Gurtovoy already explained a future plan to add reference
count.
Below example has Initiator Task Tag = 0x1e. We are tracking the same ITT.
[177617.255969] session508: iscsi_prep_scsi_cmd_pdu iscsi prep [write cid 0 sc 000000009d6ff976 cdb 0x2a itt 0x1e len 8192 cmdsn 90 win 64]
[177617.255982] iser: iscsi_iser_task_xmit: cmd [itt 1e total 8192 imm 8192 unsol_data 0
(1) >> [177617.255985] iser: iscsi_iser_task_xmit: ctask xmit [cid 0 itt 0x1e]
[177617.255992] iser: iser_reg_dma: Single DMA entry: lkey=0xffffffff, rkey=0x0, addr=0x9c5de000, length=0x2000
[177617.255995] iser: iser_prepare_write_cmd: Cmd itt:30, WRITE, adding imm.data sz: 8192
[177617.256002] iser: iser_send_command: from iser_send_command iser_task ffff9e39dad96898 tx_count 1
(2) >> [177617.256016] iser: iser_cmd_comp: from iser_cmd_comp iser_task ffff9e39dad96898 tx_count 0
(3) >> [177617.256045] iser: iser_task_rsp: op 0x21 itt 0x1e dlen 0
[177617.256049] session508: __iscsi_complete_pdu [op 0x21 cid 0 itt 0x1e len 0]
[177617.256052] session508: iscsi_scsi_cmd_rsp cmd rsp done [sc 000000009d6ff976 res 0 itt 0x1e]
[177617.256055] session508: iscsi_complete_task complete task itt 0x1e state 3 sc 000000009d6ff976
(4) >> [177617.256057] session508: iscsi_free_task freeing task itt 0x1e state 1 sc 000000009d6ff976
(5) >> [177617.256067] bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0x41137800 flags=0x0000]
[177617.256068] bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0xc1595098 flags=0x0000]
(1) Is when iser stack has posted TX DESC type = ISCSI_TX_SCSI_COMMAND having two SGL.
sgl[0] is contain iscsi_header and sgl[1] is scsi data (of length 8K)
of scsi command cdb = WRITE_10
(2) In a good case, TX completion is received,
but in some cases TX completion is not yet received to the iser stack.
(3) Is when iser stack gets completion of RX DESC of ITT = 0x1e.
This is actual scsi completion of TX_DESC from target.
(4) Is when the iser stack destroys all TX DESC resources associated with ITT = 0x1e.
(5) Is when HCA is still accessing WQE entry of ITT = 0x1e (possible in retransmission path).
After (4), sgl[0] and sgl[1] dma addresses are unmapped and not allowed to be used by HCA.
Get the iscsi_task reference count if TX DESC is successfully posted.
Put the iscsi_task reference count once TX_DESC is completed.
This method will make sure iscsi_free_task will be called only after TX
and RX DESC received for the task->itt.
This patch depends upon [0] as it required signaled completion.
[0] https://patchwork.kernel.org/project/linux-rdma/patch/20211215135721.3662-5-mgurtovoy@xxxxxxxxxx/
Cc: "Jason Gunthorpe" <jgg@xxxxxxxxxx>
Cc: "Max Gurtovoy" <mgurtovoy@xxxxxxxxxx>
Cc: "Sagi Grimberg" <sagi@xxxxxxxxxxx>
Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx>
---
drivers/infiniband/ulp/iser/iscsi_iser.c | 5 ++++-
drivers/infiniband/ulp/iser/iser_initiator.c | 6 ++++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 620ae5b2d80d..f1704fef9dc8 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -339,9 +339,12 @@ static int iscsi_iser_task_xmit(struct iscsi_task *task)
/* Send the cmd PDU */
if (!iser_task->command_sent) {
+ iscsi_get_task(task);
error = iser_send_command(conn, task);
- if (error)
+ if (error) {
+ iscsi_put_task(task);
goto iscsi_iser_task_xmit_exit;
+ }
iser_task->command_sent = 1;
}
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index 7b83f48f60c5..1f0b1601b3b3 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -669,6 +669,12 @@ void iser_task_rsp(struct ib_cq *cq, struct ib_wc *wc)
void iser_cmd_comp(struct ib_cq *cq, struct ib_wc *wc)
{
+ struct iser_tx_desc *desc = iser_tx(wc->wr_cqe);
+ struct iscsi_task *task;
+ task = (void *)desc - sizeof(struct iscsi_task);
+
+ iscsi_put_task(task);
+
if (unlikely(wc->status != IB_WC_SUCCESS))
iser_err_comp(wc, "command");
}