I/O is retried forever if last disk in a md device fails

"Heinzmann, Robert" <Heinzmann@xxxxxxxxxxxxx> · Wed, 4 Jul 2007 16:18:19 +0200

Hello list, 

we use linux software raid (md) to do host based mirroring of 2 SAN storage systems. We also use clustering for high availability. We use DMMP for multipathing. This enables us to build fully redundant disaster recovery solutions. Our setup look like this (logical view):

-----------------------------
|        filesystem (xfs)   |
----------------------------
|         lvm lv            |
-----------------------------
|         lvm vg            |
-----------------------------
|    md device  (/dev/mdY)  |
-----------------------------
| dmmp device | dmmp device |
| (/dev/dm-X) | (/dev/dm-Y) |
-----------------------------
|  sda | sdb  |  sdc | sdd  |
-----------------------------
| Storage LUN | Storage LUN |
| (Storage 1) | (Storage 2) |
-----------------------------

However, we did some testing of our setup and one test was pulling all fibre channel cables from a server. During the test we used a broken Mirror (only one DMMP device active in the mirror). The remaining disk was a DMMP Device: /dev/dm-X. DMMP is configured to report I/O errors up the stack if all path's are dead (no_path_retry=fail). When we pulled all the cables, DMMP detects the path failures after aprox. 60 seconds. Because all path's are dead, the DMMP device is declared dead. I/O to the DMMP device fails.

  mdadm -E /dev/dm-X returns "Could not read md superblock on device"

  dd if=/dev/dm-X of=/dev/null bs=1M count=1

also returns I/O errors.

The problem is that all I/O to the md device is queued and retried forever. The /var/log/messages file shows: 

SCSI ERROR .. Return code = 0x10000
[... Many times ...]
SCSI ERROR .. Return code = 0x10000
raid1: dm-12 resheduling sector 0

This returns in an endless loop until I shut down the machine. 

This does not allow our cluster to do a clean failover of the device, because it can not deactivate lv logical volumes on top of the md device. The "lvchange -an" call hangs forever. I think this issue is no DMMP sepcific and should be the same for all devices. 

The kernel we use is SLES9 SP3 kernel on IA64. 

Is there a config option or a later version of the MD kernel code / tools that report I/O errors up the stack if the last disk in a md device fails ? 

If there is a later version, we may be able to get the patch back to the SLES9 kernel by our SuSE support agreement. 

Regards, 
Robert 

------------------------------------------------------------------------
COMPUTER CONCEPT 
CC Computersysteme und Kommunikationstechnik GmbH 
Robert Heinzmann
Wiener Str. 114 - 116		Email:	heinzmann@xxxxxxxxxxxxx
01219 Dresden			Telefon:	+49 (351) 8 76 92-0
					Telefax:	+49 (351) 8 76 92-99
					Internet:	http://www.cc-dresden.de
Handelregister Dresden, HRB 214
Geschäftsführer: Gerd Jelinek
------------------------------------------------------------------------ 
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html