Re: [PATCH 0/3] target: Fix queue-full callback error signaling

"Bryant G. Ly" <bryantly@xxxxxxxxxxxxxxxxxx> · Mon, 7 Aug 2017 09:43:57 -0500

Hi Bryant,

(Re-adding the target-devel CC')

On Mon, 2017-07-31 at 16:06 -0500, Bryant G. Ly wrote:
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>

Hi folks,

This series contains target-core queue-full + iscsi-target
callback fixes for a bug reported by Steve and Potnuri
during recent v4.8.y iWARP/iser-target testing.

The first patch fixes target-core queue-full handling response
leaks with non -EAGAIN / -ENOMEM errors.  It uses a new state
TRANSPORT_COMPLETE_QF_ERR to internally generate CHECK_CONDITION
for unknown fabric callback errors, to avoid attempting retry
of fabric data-transfer callbacks for this special case.

This means all non -EAGAIN / -ENOMEM fabric callback errors
during target_core_fabric_ops for:

    *) ->write_pending()
    *) ->queue_data_in()
    *) ->queue_status()

will result in CHECK_CONDITION + LOGICAL_UNIT_COMMUNICATION_FAILURE,
if no non-zero se_cmd->scsi_status was previously set.
It also means target-core ->queue_status() errors retry indefinately,
or until session shutdown explicitly stops outstanding I/O.

The remaining changes are for propagating iscsit_transport
response failure back to target-core queue-full, and updating
iser-target to propagate isert_rdma_rw_ctx_post() errors from
RDMA R/W API back to target-core as well.

Please review.

--nab

Nicholas Bellinger (3):
    target: Fix unknown fabric callback queue-full errors
    iscsi-target: Propigate queue_data_in + queue_status errors
    iser-target: Fix queue-full response handling

   drivers/infiniband/ulp/isert/ib_isert.c      |  53 +++++++++-----
   drivers/target/iscsi/iscsi_target.c          |   3 +-
   drivers/target/iscsi/iscsi_target_configfs.c |  13 ++--
   drivers/target/iscsi/iscsi_target_util.c     |   5 +-
   drivers/target/iscsi/iscsi_target_util.h     |   2 +-
   drivers/target/target_core_transport.c       | 102 ++++++++++++++++++---------
   include/target/target_core_base.h            |   1 +
   7 files changed, 114 insertions(+), 65 deletions(-)

Hi Nick,

I was sent this trace and to me it looks like our driver isn't getting a response back from target-core when we return -EIO
on ibmvscsis_write_pending from a failed srp_transfer_data. This then locks us up on a wait_for_completion since our
active_queue/scheduled_queues aren't empty. So we keep waiting, do you think there is still a leak for outgoing responses
when our fabric driver does not return a -EAGAIN or -ENOMEM?

Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.123071] dev[c0000001fe6be000]: Backstore name '830530a9-2d65-317c-bea0-9ebc82739b64' is too long for INQUIRY_MODEL, truncating to 16 bytes
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592683] ibmvscsis 30000006: rdma: h_copy_rdma failed, rc -11
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592699] ibmvscsis: srp_transfer_data() failed: -11
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592705] Got unknown fabric queue status: -5
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592713] ibmvscsis 30000006: build_response: error copying to client, rc -11, flags 0x140, state 0x10
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592724] ibmvscsis: write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592731] ibmvscsis: write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592738] ibmvscsis: write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592745] ibmvscsis: write_pending failed since: 320
Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.620264] ibmvscsis: connection lost with outstanding work
Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.666964] HVCS: Closed vty-server@30000008 and partner vty@30000000:2 connection.
<SNIP>
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428287] INFO: task kworker/1:4:6743 blocked for more than 120 seconds.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428296]       Tainted: G           OE   4.8.0-58-generic #63~16.04.1-Ubuntu
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428301] kworker/1:4     D 0000000000000000     0  6743      2 0x00000800
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Workqueue: ibmvscsis30000006 ibmvscsis_disconnect [ibmvscsis]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428313] [c0000001ae0c36f0] [c0000000014ef090] sysctl_sched_migration_cost+0x0/0x4 (unreliable)
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428316] [c0000001ae0c38c0] [c00000000001a610] __switch_to+0x280/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428319] [c0000001ae0c3920] [c000000000b4b6e4] __schedule+0x2f4/0x990
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428321] [c0000001ae0c3a00] [c000000000b4bdcc] schedule+0x4c/0xc0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428322] [c0000001ae0c3a30] [c000000000b50404] schedule_timeout+0x284/0x480
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428324] [c0000001ae0c3b20] [c000000000b4cd9c] wait_for_common+0xec/0x240
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428326] [c0000001ae0c3ba0] [d000000008f66540] ibmvscsis_disconnect+0x240/0x9d0 [ibmvscsis]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428329] [c0000001ae0c3c50] [c0000000000f1c28] process_one_work+0x1e8/0x5b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428330] [c0000001ae0c3ce0] [c0000000000f2098] worker_thread+0xa8/0x650
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428332] [c0000001ae0c3d80] [c0000000000fa864] kthread+0x114/0x140
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428334] [c0000001ae0c3e30] [c0000000000098f0] ret_from_kernel_thread+0x5c/0x6c
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428338] INFO: task targetcli:9627 blocked for more than 120 seconds.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428341]       Tainted: G           OE   4.8.0-58-generic #63~16.04.1-Ubuntu
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428343] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428345] targetcli       D 00003fff9bf3fd24     0  9627   9626 0x00040000
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428347] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428349] [c0000001ad647700] [c0000000014ef090] sysctl_sched_migration_cost+0x0/0x4 (unreliable)
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428351] [c0000001ad6478d0] [c00000000001a610] __switch_to+0x280/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428352] [c0000001ad647930] [c000000000b4b6e4] __schedule+0x2f4/0x990
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428354] [c0000001ad647a10] [c000000000b4bdcc] schedule+0x4c/0xc0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428355] [c0000001ad647a40] [c000000000b50404] schedule_timeout+0x284/0x480
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428357] [c0000001ad647b30] [c000000000b4cd9c] wait_for_common+0xec/0x240
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428366] [c0000001ad647bb0] [d0000000079a4d60] transport_clear_lun_ref+0x50/0x70 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428373] [c0000001ad647be0] [d00000000799d0e8] core_tpg_remove_lun+0x58/0x160 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428379] [c0000001ad647c20] [d0000000079895fc] core_dev_del_lun+0x4c/0x150 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428385] [c0000001ad647cb0] [d00000000798b590] target_fabric_port_unlink+0x70/0x90 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428388] [c0000001ad647ce0] [d000000004fd5bb8] configfs_unlink+0x188/0x2c0 [configfs]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428390] [c0000001ad647d40] [c000000000338774] vfs_unlink+0x104/0x2a0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428392] [c0000001ad647d90] [c00000000033fe08] do_unlinkat+0x378/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428393] [c0000001ad647e30] [c000000000009584] system_call+0x38/0xec
Jul 20 16:58:33 tuk6r1phn1 kernel: [494927.259770] INFO: task kworker/1:4:6743 blocked for more than 120 seconds.

On v4.10.y and earlier code without commit fa7e25cf13, returning
anything other than -EAGAIN or -ENOMEM from fabric code via
TFO->write_pending() will result in a lost response and subsequent hung
tasks..

In ibmvscsi's case, I see -EAGAIN was changed to -EIO in commit
0edcc47cdb7 to avoid the endless loop on v4.9.y code, when the
srp_transfer_data() would fail repeatably.

That said, with commit fa7e25cf13 in place, anything other than -EAGAIN
or -ENOMEM from TFO->write_pending() will reach the following code in
transport_handle_queue_full():

         /*
          * -EAGAIN or -ENOMEM signals retry of ->write_pending() and/or
          * ->queue_data_in() callbacks from new process context.
          *
          * Otherwise for other errors, transport_complete_qf() will send
          * CHECK_CONDITION via ->queue_status() instead of attempting to
          * retry associated fabric driver data-transfer callbacks.
          */
         if (err == -EAGAIN || err == -ENOMEM) {
                 cmd->t_state = (write_pending) ? TRANSPORT_COMPLETE_QF_WP :
                                                  TRANSPORT_COMPLETE_QF_OK;
         } else {
                 pr_warn_ratelimited("Got unknown fabric queue status: %d\n", err);
                 cmd->t_state = TRANSPORT_COMPLETE_QF_ERR;
         }

which sets TRANSPORT_COMPLETE_QF_ERR, and invokes the following in
transport_complete_qf() to immediately complete se_cmd with any fabric
driver provided ->scsi_status, or return CHECK_CONDITION with logical
unit communication failure sense data if no ->scsi_status is set:

         transport_complete_task_attr(cmd);
         /*
          * If a fabric driver ->write_pending() or ->queue_data_in() callback
          * has returned neither -ENOMEM or -EAGAIN, assume it's fatal and
          * the same callbacks should not be retried.  Return CHECK_CONDITION
          * if a scsi_status is not already set.
          *
          * If a fabric driver ->queue_status() has returned non zero, always
          * keep retrying no matter what..
          */
         if (cmd->t_state == TRANSPORT_COMPLETE_QF_ERR) {
                 if (cmd->scsi_status)
                         goto queue_status;

                 cmd->se_cmd_flags |= SCF_EMULATED_TASK_SENSE;
                 cmd->scsi_status = SAM_STAT_CHECK_CONDITION;
                 cmd->scsi_sense_length  = TRANSPORT_SENSE_BUFFER;
                 translate_sense_reason(cmd, TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
                 goto queue_status;
         }

So for ibmvscsi on v4.8.y code, I'd backport commit fa7e25cf13 to your
v4.8.y tree, and if/when a specific srp_transfer_data() failure is
retriable, return -EAGAIN.  Or if the srp_transport_data() failure is
not retriable and se_cmd should be completed immediately with exception
status, return -EIO.

Btw, I'd been thinking about including commit fa7e25cf13 for stable
backports for the iser-target consumers that need it.  If it addresses
this case ibmvscsi as well on <= v4.10.y code, it's even more a reason
to add it into stable.

Hi Nick,

Thanks for the explanation. The thing is my tree has the commit fa7e25cf13.

https://github.com/powervm/ibmvscsis/tree/Yakkety_develop

That tree contains the commit fa7e25cf13 and the commit 0edcc47cdb7 to change
write_pending srp_transfer_data failure to -EIO.

The trace shows 4.8.0-58-generic Ubuntu kernel, but we put a DKMS layer on top that
overrides all of LIO + Ibmvscsis using the files from that github branch (which is based
off of Yakkety-master).

So with all that in mind, I sent the original message because I think the patch might have
a bug in it somewhere. The reason I think that is because the person who first tested this
did not have the DKMS package applied on top that overwrites LIO + Ibmvscis.

The difference was that all of the Ibmvscsis files were already fully backported into 4.8.y
kernels. Thus, they had commit 0edcc47cdb7, but did not have commit fa7e25cf13 and was
functioning fine. Then they applied the DKMS package and they get this problem all the time.

-Bryant

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html