md and sd out of sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have 16 x 2TB drives that each partitioned into 3 equal sized partitions.

Three md RAID6 arrays have then been built, each utilising one partition on each drive.

Over the weekend, one member of one array was failed out:

end_request: I/O error, dev sdz, sector 1302228737
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on sdz1, disabling device.
raid5: Operation continuing on 15 devices.

Checking with smartctl is not an option as the controller (LSI SAS) reacts badly. On the basis of it possibly being a transitory error or a sector that could be remapped on resync I re-added it to the array.

This failed part way through and cased enough disruption to the controller that the whole drive was taken offline:

sd 8:0:24:0: [sdz] <6>sd 8:0:24:0: [sdz] Result: hostbyte=DID_NO_CONNECT driverb
yte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdz, sector 569772337
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdz, sector 569769777
sd 8:0:24:0: [sdz] <6>mptsas: ioc0: removing sata device, channel 0, id 32, phy
11
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdz, sector 569770097
 port-8:1:8: mptsas: ioc0: delete port (8)


I would have thought at this point for mdadm to be reporting that the remaining 2 complete arrays had each lost their /dev/sdz components, but this is not the case - it shows healthy arrays.

Is this expected behaviour?

To complicate things further, without any intervention, the disconnected drive was then recognised again as a new device and reconnected as /dev/sdai:

mptsas: ioc0: attaching sata device, channel 0, id 32, phy 11
scsi 8:0:34:0: Direct-Access ATA WDC WD2003FYYS-0 0D02 PQ: 0 ANSI: 5
sd 8:0:34:0: [sdai] 3907029168 512-byte hardware sectors (2000399 MB)
sd 8:0:34:0: [sdai] Write Protect is off
sd 8:0:34:0: [sdai] Mode Sense: 73 00 00 08
sd 8:0:34:0: [sdai] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 8:0:34:0: [sdai] 3907029168 512-byte hardware sectors (2000399 MB)
sd 8:0:34:0: [sdai] Write Protect is off
sd 8:0:34:0: [sdai] Mode Sense: 73 00 00 08
sd 8:0:34:0: [sdai] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdai: sdai1 sdai2 sdai3
sd 8:0:34:0: [sdai] Attached SCSI disk
sd 8:0:34:0: Attached scsi generic sg26 type 0


Because sdz no longer exists, I cannot fail and remove /dev/sdz2 and /dev/sdz3 from the other 2 md arrays.

I will proceed by just replacing the drive and rebooting, at which point I should just be able to re-add it to all arrays, but I just wanted to draw attention to how ignorant md seems to be to all the changes that have occurred. Maybe things have changed in later versions:

Kernel 2.6.27.19-78.2.30.fc9.x86_64
mdadm 2.6.4

Regards,

Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux