Re: [PATCH 49/51] scsi: Move eh_device_reset_handler() to use scsi_device as argument

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/17/21 6:13 PM, Steffen Maier wrote:
On 8/17/21 11:14 AM, Hannes Reinecke wrote:
When resetting a device we shouldn't depend on an existing SCSI
command, as this might be completed at any time.
Rather we should use 'struct scsi_device' as argument for
eh_device_reset_handler().

Signed-off-by: Hannes Reinecke <hare@xxxxxxxx>

Acked-by: Steffen Maier <maier@xxxxxxxxxxxxx> # for zfcp

However, independent review comments for common code below...

---
  Documentation/scsi/scsi_eh.rst                |  2 +-
  Documentation/scsi/scsi_mid_low_api.rst       |  4 +--

  drivers/s390/scsi/zfcp_scsi.c                 |  4 +--

  drivers/scsi/scsi_error.c                     | 35 +++++++++++++------

  include/scsi/scsi_host.h                      |  2 +-
  62 files changed, 314 insertions(+), 329 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.rst b/Documentation/scsi/scsi_eh.rst
index e09c81a54702..23f0d09668d9 100644
--- a/Documentation/scsi/scsi_eh.rst
+++ b/Documentation/scsi/scsi_eh.rst
@@ -214,7 +214,7 @@ considered to fail always.
  ::

      int (* eh_abort_handler)(struct scsi_cmnd *);
-    int (* eh_device_reset_handler)(struct scsi_cmnd *);
+    int (* eh_device_reset_handler)(struct scsi_device *);
      int (* eh_target_reset_handler)(struct scsi_target *);
      int (* eh_bus_reset_handler)(struct Scsi_Host *, int);
      int (* eh_host_reset_handler)(struct Scsi_Host *);
diff --git a/Documentation/scsi/scsi_mid_low_api.rst b/Documentation/scsi/scsi_mid_low_api.rst
index 0afc1b4f89af..4650c0c6a22a 100644
--- a/Documentation/scsi/scsi_mid_low_api.rst
+++ b/Documentation/scsi/scsi_mid_low_api.rst
@@ -778,7 +778,7 @@ Details::

      /**
      *      eh_device_reset_handler - issue SCSI device reset
-    *      @scp: identifies SCSI device to be reset
+    *      @sdev: identifies SCSI device to be reset
      *
      *      Returns SUCCESS if command aborted else FAILED
      *
@@ -791,7 +791,7 @@ Details::
      *
      *      Optionally defined in: LLD
      **/
-    int eh_device_reset_handler(struct scsi_cmnd * scp)
+    int eh_device_reset_handler(struct scsi_device * sdev)


      /**


diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 6492c3b1b12f..4fa626763bb6 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -333,10 +333,8 @@ static int zfcp_scsi_task_mgmt_function(struct scsi_device *sdev, u8 tm_flags)
      return retval;
  }

-static int zfcp_scsi_eh_device_reset_handler(struct scsi_cmnd *scpnt)
+static int zfcp_scsi_eh_device_reset_handler(struct scsi_device *sdev)
  {
-    struct scsi_device *sdev = scpnt->device;
-
      return zfcp_scsi_task_mgmt_function(sdev, FCP_TMF_LUN_RESET);
  }



diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1d8e2f655833..44e29558b068 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -910,7 +910,7 @@ static enum scsi_disposition scsi_try_bus_device_reset(struct scsi_cmnd *scmd)

    struct scsi_host_template *hostt = scmd->device->host->hostt;

      if (!hostt->eh_device_reset_handler)
          return FAILED;

-    rtn = hostt->eh_device_reset_handler(scmd);
+    rtn = hostt->eh_device_reset_handler(scmd->device);
      if (rtn == SUCCESS)
          __scsi_report_device_reset(scmd->device, NULL);
      return rtn;

ok
(now that we use scmd->device 3 times in this function we could introduce a local variable sdev, similarly as starget in patch 36; but that would be more changed lines)


@@ -1195,6 +1195,7 @@ scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn)
   * scsi_eh_finish_cmd - Handle a cmd that eh is finished with.

That old comment seems now in front of the new internal __scsi_eh_finish_cmd rathern than scsi_eh_finish_cmd ?

   * @scmd:    Original SCSI cmd that eh has finished.
   * @done_q:    Queue for processed commands.
+ * @result:    Final command status to be set

You introduce a 3rd argument named "host_byte" (not "result") below?

   *
   * Notes:
   *    We don't want to use the normal command completion while we are are

 *    still handling errors - it may cause other commands to be queued,
 *    and that would disturb what we are doing.  Thus we really want to

@@ -1203,10 +1204,18 @@ scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn)
   *    keep a list of pending commands for final completion, and once we
   *    are ready to leave error handling we handle completion for real.
   */
-void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,

Should that new internal helper be static?

+            int host_byte)
  {
+    if (host_byte)
+        set_host_byte(scmd, host_byte);
      list_move_tail(&scmd->eh_entry, done_q);
  }
+

I whish we still had a kdoc for the actual API function scsi_eh_finish_cmd().

+void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
+{
+    __scsi_eh_finish_cmd(scmd, done_q, 0);
+}
  EXPORT_SYMBOL(scsi_eh_finish_cmd);

  /**
@@ -1381,7 +1390,8 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
                  if (finish_cmds &&
                      (try_stu ||
                       scsi_eh_action(scmd, SUCCESS) == SUCCESS))
-                    scsi_eh_finish_cmd(scmd, done_q);
+                    __scsi_eh_finish_cmd(scmd, done_q,
+                                 DID_RESET);
                  else
                      list_move_tail(&scmd->eh_entry, work_q);
              }
@@ -1529,8 +1539,9 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
                               work_q, eh_entry) {
                      if (scmd->device == sdev &&
                          scsi_eh_action(scmd, rtn) != FAILED)
-                        scsi_eh_finish_cmd(scmd,
-                                   done_q);
+                        __scsi_eh_finish_cmd(scmd,
+                                     done_q,
+                                     DID_RESET);
                  }
              }
          } else {
@@ -1598,7 +1609,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
              if (rtn == SUCCESS)
                  list_move_tail(&scmd->eh_entry, &check_list);
              else if (rtn == FAST_IO_FAIL)
-                scsi_eh_finish_cmd(scmd, done_q);
+                __scsi_eh_finish_cmd(scmd, done_q,
+                             DID_TRANSPORT_DISRUPTED);
              else
                  /* push back on work queue for further processing */
                  list_move(&scmd->eh_entry, work_q);
@@ -1663,8 +1675,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
              list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
                  if (channel == scmd_channel(scmd)) {
                      if (rtn == FAST_IO_FAIL)
-                        scsi_eh_finish_cmd(scmd,
-                                   done_q);
+                        __scsi_eh_finish_cmd(scmd,
+                                     done_q,
+                                     DID_TRANSPORT_DISRUPTED);
                      else
                          list_move_tail(&scmd->eh_entry,
                                     &check_list);
@@ -1707,9 +1720,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost,
          if (rtn == SUCCESS) {
              list_splice_init(work_q, &check_list);
          } else if (rtn == FAST_IO_FAIL) {
-            list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-                    scsi_eh_finish_cmd(scmd, done_q);
-            }
+            list_for_each_entry_safe(scmd, next, work_q, eh_entry)
+                __scsi_eh_finish_cmd(scmd, done_q,
+                             DID_TRANSPORT_DISRUPTED);
          } else {
              SCSI_LOG_ERROR_RECOVERY(3,
                  shost_printk(KERN_INFO, shost,

I don't understand the RESET vs. DISRUPTED depending on escalation level.
Care to explain in the patch description (or even code comment)?
Is there any functional change compared to today and if so which?


Well.
That's arguably slightly odd, but anyway:
_in principle_ an EH handler might return FAST_IO_FAIL, which would indicate that the EH function could not complete due the transport being busy (eg RSCN processing taking longer than expected). In that case it's expected to be a transient condition, which might/should be resolved with a simple retry. Hence we should indicate that by completing commands with DID_TRANSPORT_DISRUPTED, to allow upper layers to retry.

Having said that it's slightly odd to have FAST_IO_FAIL being returned from a host reset, as really the intention is that the driver will be in a stable state after reset. But this doesn't seem to be true for zfcp, as we're calling fc_block_rport() after the reset itself, which means that there's a chance we'll be running into this situation.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux