[PATCH 0/4] scsi: scsi_dh_alua: handle target port unavailable state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch series resolves a problem in which all paths of a multipath device
became _permanently_ failed after a storage system had moved both controllers
into a _temporarily_ unavailable state (that is SCSI_ACCESS_STATE_UNAVAILABLE).

This happened because once scsi_dh_alua had set the 'pg->state' to that value,
any IO coming to that PG via alua_prep_fn() would be immediately failed there.

It was possible to confirm that IO coming to that PG by another function path
(e.g., SG_IO) would perform normally once that PG's respective storage system
controller had transitioned back to an active state.

- Patch 1 essentially resolves that problem by allowing IO requests coming in
  the SCSI_ACCESS_STATE_UNAVAILABLE to actually proceed in alua_prep_fn(). It
  also schedules a recheck in alua_check_sense() to update pg->state properly.
  The problem/debug test-case is included in its commit message for reference.

- Patch 2 and Patch 3 address uncertainty & potentially incorrect assumptions
  when trying to reconcile the alua: RTPG information in the kernel logs with
  the actual port groups state at a given point in time and to multipath/path
  checkers status/failed/reinstated messages, since scsi_dh_alua could update
  the PG state for the 'other' PG (i.e., not the PG by which the RTPG request
  was sent to) but only present an updated state message for the 'current' PG.

- Patch 4 silences the scsi_dh_alua messages about RTPG state/information for
  the unavailable state if it is no news (i.e., not a transition to/out-of it),
  only keeping the first and (potentially) last message (when it is some news).
  That's because during the period in which the unavailable state is in place,
  the path checkers will naturally have to go through alua_check_sense() path,
  which schedules a recheck and thus alua_rtpg() goes through the sdev_printk.

This patch series has been tested with the 4.11-rc4 kernel.

For documentation purposes, I'll reply to this cover letter with the analysis
of such cases of this problem, and the accompanying messages from kernel logs.

Mauricio Faria de Oliveira (4):
  scsi: scsi_dh_alua: allow I/O in the target port unavailable state
  scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg()
    sdev_printk
  scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
  scsi: scsi_dh_alua: do not print target port group state if it remains
    unavailable

 drivers/scsi/device_handler/scsi_dh_alua.c | 99 ++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 11 deletions(-)

-- 
1.8.3.1




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux