On 06/19/13 15:44, Jack Wang wrote:
+ /*
+ * It can occur that after fast_io_fail_tmo expired and before
+ * dev_loss_tmo expired that the SCSI error handler has
+ * offlined one or more devices. scsi_target_unblock() doesn't
+ * change the state of these devices into running, so do that
+ * explicitly.
+ */
+ spin_lock_irq(shost->host_lock);
+ __shost_for_each_device(sdev, shost)
+ if (sdev->sdev_state == SDEV_OFFLINE)
+ sdev->sdev_state = SDEV_RUNNING;
+ spin_unlock_irq(shost->host_lock);
Do you have test case to verify this behaviour?
Hello Jack,
This is what I came up with after analyzing why a so-called "port
flapping" test failed. The concept of that test is simple: use
ibportstate to disable and reenable the proper IB port on the switch
with random intervals and check whether I/O starts running again if the
path remains operational long enough. When running such a test for a few
days with random intervals between a few seconds and a few minutes
sooner or later it will occur that scsi_try_host_reset() succeeds and
that scsi_eh_test_devices() fails. That will cause the SCSI error
handler to offline devices. Hence the above code to change the offline
state into running after a reconnect succeeds. I'm not proud of that
code but I couldn't find a better solution. Maybe the above code won't
be necessary anymore once we switch to Hannes' new SCSI error handler.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html