Re: [PATCH 0/3] target: Fix queue-full callback error signaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Bryant,

(Re-adding the target-devel CC')

On Mon, 2017-07-31 at 16:06 -0500, Bryant G. Ly wrote:
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>

Hi folks,

This series contains target-core queue-full + iscsi-target
callback fixes for a bug reported by Steve and Potnuri
during recent v4.8.y iWARP/iser-target testing.

The first patch fixes target-core queue-full handling response
leaks with non -EAGAIN / -ENOMEM errors.  It uses a new state
TRANSPORT_COMPLETE_QF_ERR to internally generate CHECK_CONDITION
for unknown fabric callback errors, to avoid attempting retry
of fabric data-transfer callbacks for this special case.

This means all non -EAGAIN / -ENOMEM fabric callback errors
during target_core_fabric_ops for:

    *) ->write_pending()
    *) ->queue_data_in()
    *) ->queue_status()

will result in CHECK_CONDITION + LOGICAL_UNIT_COMMUNICATION_FAILURE,
if no non-zero se_cmd->scsi_status was previously set.
It also means target-core ->queue_status() errors retry indefinately,
or until session shutdown explicitly stops outstanding I/O.

The remaining changes are for propagating iscsit_transport
response failure back to target-core queue-full, and updating
iser-target to propagate isert_rdma_rw_ctx_post() errors from
RDMA R/W API back to target-core as well.

Please review.

--nab

Nicholas Bellinger (3):
    target: Fix unknown fabric callback queue-full errors
    iscsi-target: Propigate queue_data_in + queue_status errors
    iser-target: Fix queue-full response handling

   drivers/infiniband/ulp/isert/ib_isert.c      |  53 +++++++++-----
   drivers/target/iscsi/iscsi_target.c          |   3 +-
   drivers/target/iscsi/iscsi_target_configfs.c |  13 ++--
   drivers/target/iscsi/iscsi_target_util.c     |   5 +-
   drivers/target/iscsi/iscsi_target_util.h     |   2 +-
drivers/target/target_core_transport.c | 102 ++++++++++++++++++---------
   include/target/target_core_base.h            |   1 +
   7 files changed, 114 insertions(+), 65 deletions(-)

Hi Nick,

I was sent this trace and to me it looks like our driver isn't getting a response back from target-core when we return -EIO on ibmvscsis_write_pending from a failed srp_transfer_data. This then locks us up on a wait_for_completion since our active_queue/scheduled_queues aren't empty. So we keep waiting, do you think there is still a leak for outgoing responses
when our fabric driver does not return a -EAGAIN or -ENOMEM?

Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.123071] dev[c0000001fe6be000]: Backstore name '830530a9-2d65-317c-bea0-9ebc82739b64' is too long for INQUIRY_MODEL, truncating to 16 bytes Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592683] ibmvscsis 30000006: rdma: h_copy_rdma failed, rc -11 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592699] ibmvscsis: srp_transfer_data() failed: -11 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592705] Got unknown fabric queue status: -5 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592713] ibmvscsis 30000006: build_response: error copying to client, rc -11, flags 0x140, state 0x10 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592724] ibmvscsis: write_pending failed since: 320 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592731] ibmvscsis: write_pending failed since: 320 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592738] ibmvscsis: write_pending failed since: 320 Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592745] ibmvscsis: write_pending failed since: 320 Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.620264] ibmvscsis: connection lost with outstanding work Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.666964] HVCS: Closed vty-server@30000008 and partner vty@30000000:2 connection.
<SNIP>
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428287] INFO: task kworker/1:4:6743 blocked for more than 120 seconds. Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428296] Tainted: G OE 4.8.0-58-generic #63~16.04.1-Ubuntu Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428301] kworker/1:4 D 0000000000000000 0 6743 2 0x00000800 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Workqueue: ibmvscsis30000006 ibmvscsis_disconnect [ibmvscsis]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428313] [c0000001ae0c36f0] [c0000000014ef090] sysctl_sched_migration_cost+0x0/0x4 (unreliable) Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428316] [c0000001ae0c38c0] [c00000000001a610] __switch_to+0x280/0x3b0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428319] [c0000001ae0c3920] [c000000000b4b6e4] __schedule+0x2f4/0x990 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428321] [c0000001ae0c3a00] [c000000000b4bdcc] schedule+0x4c/0xc0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428322] [c0000001ae0c3a30] [c000000000b50404] schedule_timeout+0x284/0x480 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428324] [c0000001ae0c3b20] [c000000000b4cd9c] wait_for_common+0xec/0x240 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428326] [c0000001ae0c3ba0] [d000000008f66540] ibmvscsis_disconnect+0x240/0x9d0 [ibmvscsis] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428329] [c0000001ae0c3c50] [c0000000000f1c28] process_one_work+0x1e8/0x5b0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428330] [c0000001ae0c3ce0] [c0000000000f2098] worker_thread+0xa8/0x650 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428332] [c0000001ae0c3d80] [c0000000000fa864] kthread+0x114/0x140 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428334] [c0000001ae0c3e30] [c0000000000098f0] ret_from_kernel_thread+0x5c/0x6c Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428338] INFO: task targetcli:9627 blocked for more than 120 seconds. Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428341] Tainted: G OE 4.8.0-58-generic #63~16.04.1-Ubuntu Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428343] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428345] targetcli D 00003fff9bf3fd24 0 9627 9626 0x00040000
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428347] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428349] [c0000001ad647700] [c0000000014ef090] sysctl_sched_migration_cost+0x0/0x4 (unreliable) Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428351] [c0000001ad6478d0] [c00000000001a610] __switch_to+0x280/0x3b0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428352] [c0000001ad647930] [c000000000b4b6e4] __schedule+0x2f4/0x990 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428354] [c0000001ad647a10] [c000000000b4bdcc] schedule+0x4c/0xc0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428355] [c0000001ad647a40] [c000000000b50404] schedule_timeout+0x284/0x480 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428357] [c0000001ad647b30] [c000000000b4cd9c] wait_for_common+0xec/0x240 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428366] [c0000001ad647bb0] [d0000000079a4d60] transport_clear_lun_ref+0x50/0x70 [target_core_mod] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428373] [c0000001ad647be0] [d00000000799d0e8] core_tpg_remove_lun+0x58/0x160 [target_core_mod] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428379] [c0000001ad647c20] [d0000000079895fc] core_dev_del_lun+0x4c/0x150 [target_core_mod] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428385] [c0000001ad647cb0] [d00000000798b590] target_fabric_port_unlink+0x70/0x90 [target_core_mod] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428388] [c0000001ad647ce0] [d000000004fd5bb8] configfs_unlink+0x188/0x2c0 [configfs] Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428390] [c0000001ad647d40] [c000000000338774] vfs_unlink+0x104/0x2a0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428392] [c0000001ad647d90] [c00000000033fe08] do_unlinkat+0x378/0x3b0 Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428393] [c0000001ad647e30] [c000000000009584] system_call+0x38/0xec Jul 20 16:58:33 tuk6r1phn1 kernel: [494927.259770] INFO: task kworker/1:4:6743 blocked for more than 120 seconds.


On v4.10.y and earlier code without commit fa7e25cf13, returning
anything other than -EAGAIN or -ENOMEM from fabric code via
TFO->write_pending() will result in a lost response and subsequent hung
tasks..

In ibmvscsi's case, I see -EAGAIN was changed to -EIO in commit
0edcc47cdb7 to avoid the endless loop on v4.9.y code, when the
srp_transfer_data() would fail repeatably.

That said, with commit fa7e25cf13 in place, anything other than -EAGAIN
or -ENOMEM from TFO->write_pending() will reach the following code in
transport_handle_queue_full():

         /*
          * -EAGAIN or -ENOMEM signals retry of ->write_pending() and/or
          * ->queue_data_in() callbacks from new process context.
          *
* Otherwise for other errors, transport_complete_qf() will send * CHECK_CONDITION via ->queue_status() instead of attempting to
          * retry associated fabric driver data-transfer callbacks.
          */
         if (err == -EAGAIN || err == -ENOMEM) {
cmd->t_state = (write_pending) ? TRANSPORT_COMPLETE_QF_WP :
TRANSPORT_COMPLETE_QF_OK;
         } else {
pr_warn_ratelimited("Got unknown fabric queue status: %d\n", err);
                 cmd->t_state = TRANSPORT_COMPLETE_QF_ERR;
         }

which sets TRANSPORT_COMPLETE_QF_ERR, and invokes the following in
transport_complete_qf() to immediately complete se_cmd with any fabric
driver provided ->scsi_status, or return CHECK_CONDITION with logical
unit communication failure sense data if no ->scsi_status is set:

         transport_complete_task_attr(cmd);
         /*
* If a fabric driver ->write_pending() or ->queue_data_in() callback * has returned neither -ENOMEM or -EAGAIN, assume it's fatal and * the same callbacks should not be retried. Return CHECK_CONDITION
          * if a scsi_status is not already set.
          *
* If a fabric driver ->queue_status() has returned non zero, always
          * keep retrying no matter what..
          */
         if (cmd->t_state == TRANSPORT_COMPLETE_QF_ERR) {
                 if (cmd->scsi_status)
                         goto queue_status;

                 cmd->se_cmd_flags |= SCF_EMULATED_TASK_SENSE;
                 cmd->scsi_status = SAM_STAT_CHECK_CONDITION;
                 cmd->scsi_sense_length  = TRANSPORT_SENSE_BUFFER;
translate_sense_reason(cmd, TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
                 goto queue_status;
         }

So for ibmvscsi on v4.8.y code, I'd backport commit fa7e25cf13 to your
v4.8.y tree, and if/when a specific srp_transfer_data() failure is
retriable, return -EAGAIN.  Or if the srp_transport_data() failure is
not retriable and se_cmd should be completed immediately with exception
status, return -EIO.

Btw, I'd been thinking about including commit fa7e25cf13 for stable
backports for the iser-target consumers that need it.  If it addresses
this case ibmvscsi as well on <= v4.10.y code, it's even more a reason
to add it into stable.


Hi Nick,

Thanks for the explanation. The thing is my tree has the commit fa7e25cf13.

https://github.com/powervm/ibmvscsis/tree/Yakkety_develop

That tree contains the commit fa7e25cf13 and the commit 0edcc47cdb7 to change
write_pending srp_transfer_data failure to -EIO.

The trace shows 4.8.0-58-generic Ubuntu kernel, but we put a DKMS layer on top that overrides all of LIO + Ibmvscsis using the files from that github branch (which is based
off of Yakkety-master).

So with all that in mind, I sent the original message because I think the patch might have a bug in it somewhere. The reason I think that is because the person who first tested this did not have the DKMS package applied on top that overwrites LIO + Ibmvscis.

The difference was that all of the Ibmvscsis files were already fully backported into 4.8.y kernels. Thus, they had commit 0edcc47cdb7, but did not have commit fa7e25cf13 and was functioning fine. Then they applied the DKMS package and they get this problem all the time.


-Bryant

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nick,

I can't easily test this since I can't re-create this bug in my environment. But do you think there would be an issue if ibmvscsis_queue_data_in always reported 0 even though srp_transfer_data failed? Or should we fix that and send -EIO for that failure. I do think ibmvscsis_queue_status for us should
always return 0 if we fail the h_copy_rdma.

I am also trying to get a build out to them that reverts this patch to see if it would fix anything, but
the person that handles that isn't in until next week...

-Bryant

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux