Re: [PATCH 0/3] target: Fix queue-full callback error signaling

"Bryant G. Ly" <bryantly@xxxxxxxxxxxxxxxxxx> · Wed, 9 Aug 2017 17:09:44 -0500

Hi Bryant,

(Re-adding the target-devel CC')

On Mon, 2017-07-31 at 16:06 -0500, Bryant G. Ly wrote:
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>

Hi folks,

This series contains target-core queue-full + iscsi-target
callback fixes for a bug reported by Steve and Potnuri
during recent v4.8.y iWARP/iser-target testing.

The first patch fixes target-core queue-full handling response
leaks with non -EAGAIN / -ENOMEM errors.  It uses a new state
TRANSPORT_COMPLETE_QF_ERR to internally generate CHECK_CONDITION
for unknown fabric callback errors, to avoid attempting retry
of fabric data-transfer callbacks for this special case.

This means all non -EAGAIN / -ENOMEM fabric callback errors
during target_core_fabric_ops for:

    *) ->write_pending()
    *) ->queue_data_in()
    *) ->queue_status()

will result in CHECK_CONDITION + LOGICAL_UNIT_COMMUNICATION_FAILURE,
if no non-zero se_cmd->scsi_status was previously set.
It also means target-core ->queue_status() errors retry indefinately,
or until session shutdown explicitly stops outstanding I/O.

The remaining changes are for propagating iscsit_transport
response failure back to target-core queue-full, and updating
iser-target to propagate isert_rdma_rw_ctx_post() errors from
RDMA R/W API back to target-core as well.

Please review.

--nab

Nicholas Bellinger (3):
    target: Fix unknown fabric callback queue-full errors
    iscsi-target: Propigate queue_data_in + queue_status errors
    iser-target: Fix queue-full response handling

   drivers/infiniband/ulp/isert/ib_isert.c      |  53 +++++++++-----
   drivers/target/iscsi/iscsi_target.c          |   3 +-
   drivers/target/iscsi/iscsi_target_configfs.c |  13 ++--
   drivers/target/iscsi/iscsi_target_util.c     |   5 +-
   drivers/target/iscsi/iscsi_target_util.h     |   2 +-
   drivers/target/target_core_transport.c       | 102 
++++++++++++++++++---------
   include/target/target_core_base.h            |   1 +
   7 files changed, 114 insertions(+), 65 deletions(-)

Hi Nick,

I was sent this trace and to me it looks like our driver isn't 
getting a response back from target-core when we return -EIO
on ibmvscsis_write_pending from a failed srp_transfer_data. This 
then locks us up on a wait_for_completion since our
active_queue/scheduled_queues aren't empty. So we keep waiting, do 
you think there is still a leak for outgoing responses
when our fabric driver does not return a -EAGAIN or -ENOMEM?

Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.123071] 
dev[c0000001fe6be000]: Backstore name 
'830530a9-2d65-317c-bea0-9ebc82739b64' is too long for 
INQUIRY_MODEL, truncating to 16 bytes
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592683] ibmvscsis 
30000006: rdma: h_copy_rdma failed, rc -11
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592699] ibmvscsis: 
srp_transfer_data() failed: -11
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592705] Got unknown 
fabric queue status: -5
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592713] ibmvscsis 
30000006: build_response: error copying to client, rc -11, flags 
0x140, state 0x10
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592724] ibmvscsis: 
write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592731] ibmvscsis: 
write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592738] ibmvscsis: 
write_pending failed since: 320
Jul 20 16:53:34 tuk6r1phn1 kernel: [494628.592745] ibmvscsis: 
write_pending failed since: 320
Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.620264] ibmvscsis: 
connection lost with outstanding work
Jul 20 16:53:35 tuk6r1phn1 kernel: [494628.666964] HVCS: Closed 
vty-server@30000008 and partner vty@30000000:2 connection.
<SNIP>
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428287] INFO: task 
kworker/1:4:6743 blocked for more than 120 seconds.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428296] Tainted: 
G           OE   4.8.0-58-generic #63~16.04.1-Ubuntu
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428298] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428301] kworker/1:4     D 
0000000000000000     0  6743      2 0x00000800
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Workqueue: 
ibmvscsis30000006 ibmvscsis_disconnect [ibmvscsis]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428310] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428313] 
[c0000001ae0c36f0] [c0000000014ef090] 
sysctl_sched_migration_cost+0x0/0x4 (unreliable)
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428316] 
[c0000001ae0c38c0] [c00000000001a610] __switch_to+0x280/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428319] 
[c0000001ae0c3920] [c000000000b4b6e4] __schedule+0x2f4/0x990
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428321] 
[c0000001ae0c3a00] [c000000000b4bdcc] schedule+0x4c/0xc0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428322] 
[c0000001ae0c3a30] [c000000000b50404] schedule_timeout+0x284/0x480
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428324] 
[c0000001ae0c3b20] [c000000000b4cd9c] wait_for_common+0xec/0x240
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428326] 
[c0000001ae0c3ba0] [d000000008f66540] 
ibmvscsis_disconnect+0x240/0x9d0 [ibmvscsis]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428329] 
[c0000001ae0c3c50] [c0000000000f1c28] process_one_work+0x1e8/0x5b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428330] 
[c0000001ae0c3ce0] [c0000000000f2098] worker_thread+0xa8/0x650
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428332] 
[c0000001ae0c3d80] [c0000000000fa864] kthread+0x114/0x140
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428334] 
[c0000001ae0c3e30] [c0000000000098f0] ret_from_kernel_thread+0x5c/0x6c
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428338] INFO: task 
targetcli:9627 blocked for more than 120 seconds.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428341] Tainted: 
G           OE   4.8.0-58-generic #63~16.04.1-Ubuntu
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428343] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428345] targetcli       D 
00003fff9bf3fd24     0  9627   9626 0x00040000
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428347] Call Trace:
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428349] 
[c0000001ad647700] [c0000000014ef090] 
sysctl_sched_migration_cost+0x0/0x4 (unreliable)
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428351] 
[c0000001ad6478d0] [c00000000001a610] __switch_to+0x280/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428352] 
[c0000001ad647930] [c000000000b4b6e4] __schedule+0x2f4/0x990
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428354] 
[c0000001ad647a10] [c000000000b4bdcc] schedule+0x4c/0xc0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428355] 
[c0000001ad647a40] [c000000000b50404] schedule_timeout+0x284/0x480
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428357] 
[c0000001ad647b30] [c000000000b4cd9c] wait_for_common+0xec/0x240
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428366] 
[c0000001ad647bb0] [d0000000079a4d60] 
transport_clear_lun_ref+0x50/0x70 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428373] 
[c0000001ad647be0] [d00000000799d0e8] core_tpg_remove_lun+0x58/0x160 
[target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428379] 
[c0000001ad647c20] [d0000000079895fc] core_dev_del_lun+0x4c/0x150 
[target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428385] 
[c0000001ad647cb0] [d00000000798b590] 
target_fabric_port_unlink+0x70/0x90 [target_core_mod]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428388] 
[c0000001ad647ce0] [d000000004fd5bb8] configfs_unlink+0x188/0x2c0 
[configfs]
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428390] 
[c0000001ad647d40] [c000000000338774] vfs_unlink+0x104/0x2a0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428392] 
[c0000001ad647d90] [c00000000033fe08] do_unlinkat+0x378/0x3b0
Jul 20 16:56:32 tuk6r1phn1 kernel: [494806.428393] 
[c0000001ad647e30] [c000000000009584] system_call+0x38/0xec
Jul 20 16:58:33 tuk6r1phn1 kernel: [494927.259770] INFO: task 
kworker/1:4:6743 blocked for more than 120 seconds.

On v4.10.y and earlier code without commit fa7e25cf13, returning
anything other than -EAGAIN or -ENOMEM from fabric code via
TFO->write_pending() will result in a lost response and subsequent hung
tasks..

In ibmvscsi's case, I see -EAGAIN was changed to -EIO in commit
0edcc47cdb7 to avoid the endless loop on v4.9.y code, when the
srp_transfer_data() would fail repeatably.

That said, with commit fa7e25cf13 in place, anything other than -EAGAIN
or -ENOMEM from TFO->write_pending() will reach the following code in
transport_handle_queue_full():

         /*
          * -EAGAIN or -ENOMEM signals retry of ->write_pending() and/or
          * ->queue_data_in() callbacks from new process context.
          *
          * Otherwise for other errors, transport_complete_qf() will 
send
          * CHECK_CONDITION via ->queue_status() instead of 
attempting to
          * retry associated fabric driver data-transfer callbacks.
          */
         if (err == -EAGAIN || err == -ENOMEM) {
                 cmd->t_state = (write_pending) ? 
TRANSPORT_COMPLETE_QF_WP :
TRANSPORT_COMPLETE_QF_OK;
         } else {
                 pr_warn_ratelimited("Got unknown fabric queue 
status: %d\n", err);
                 cmd->t_state = TRANSPORT_COMPLETE_QF_ERR;
         }

which sets TRANSPORT_COMPLETE_QF_ERR, and invokes the following in
transport_complete_qf() to immediately complete se_cmd with any fabric
driver provided ->scsi_status, or return CHECK_CONDITION with logical
unit communication failure sense data if no ->scsi_status is set:

         transport_complete_task_attr(cmd);
         /*
          * If a fabric driver ->write_pending() or ->queue_data_in() 
callback
          * has returned neither -ENOMEM or -EAGAIN, assume it's 
fatal and
          * the same callbacks should not be retried.  Return 
CHECK_CONDITION
          * if a scsi_status is not already set.
          *
          * If a fabric driver ->queue_status() has returned non 
zero, always
          * keep retrying no matter what..
          */
         if (cmd->t_state == TRANSPORT_COMPLETE_QF_ERR) {
                 if (cmd->scsi_status)
                         goto queue_status;

                 cmd->se_cmd_flags |= SCF_EMULATED_TASK_SENSE;
                 cmd->scsi_status = SAM_STAT_CHECK_CONDITION;
                 cmd->scsi_sense_length  = TRANSPORT_SENSE_BUFFER;
                 translate_sense_reason(cmd, 
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
                 goto queue_status;
         }

So for ibmvscsi on v4.8.y code, I'd backport commit fa7e25cf13 to your
v4.8.y tree, and if/when a specific srp_transfer_data() failure is
retriable, return -EAGAIN.  Or if the srp_transport_data() failure is
not retriable and se_cmd should be completed immediately with exception
status, return -EIO.

Btw, I'd been thinking about including commit fa7e25cf13 for stable
backports for the iser-target consumers that need it.  If it addresses
this case ibmvscsi as well on <= v4.10.y code, it's even more a reason
to add it into stable.

Hi Nick,

Thanks for the explanation. The thing is my tree has the commit 
fa7e25cf13.

https://github.com/powervm/ibmvscsis/tree/Yakkety_develop

That tree contains the commit fa7e25cf13 and the commit 0edcc47cdb7 to 
change
write_pending srp_transfer_data failure to -EIO.

The trace shows 4.8.0-58-generic Ubuntu kernel, but we put a DKMS 
layer on top that
overrides all of LIO + Ibmvscsis using the files from that github 
branch (which is based
off of Yakkety-master).

So with all that in mind, I sent the original message because I think 
the patch might have
a bug in it somewhere. The reason I think that is because the person 
who first tested this
did not have the DKMS package applied on top that overwrites LIO + 
Ibmvscis.

The difference was that all of the Ibmvscsis files were already fully 
backported into 4.8.y
kernels. Thus, they had commit 0edcc47cdb7, but did not have commit 
fa7e25cf13 and was
functioning fine. Then they applied the DKMS package and they get this 
problem all the time.

-Bryant

--
To unsubscribe from this list: send the line "unsubscribe 
target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nick,

I can't easily test this since I can't re-create this bug in my 
environment. But do you think there would
be an issue if ibmvscsis_queue_data_in always reported 0 even though 
srp_transfer_data failed?
Or should we fix that and send -EIO for that failure. I do think 
ibmvscsis_queue_status for us should
always return 0 if we fail the h_copy_rdma.

I am also trying to get a build out to them that reverts this patch to 
see if it would fix anything, but
the person that handles that isn't in until next week...

-Bryant

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html