Fast failing a disk.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

In case of a RAID 5 on a few disks, with a disk pulled from the enclosure at some stage, we get a disk failure.

If doing only reads on the RAID 5 array, the time it takes between the actual physical removal of the disk and the disk being failed by md is about 4 seconds on my system.

I am trying, using scripting, to speed up this 4 seconds.

The problem being that it takes 4 seconds to fail a disk because we probably need that much time to determine for sure that the disk has really failed or is really gone, I am using the LSI SAS MPT Fusion driver to detect when the physical disk phy has gone offline (which happens in near real time), and from there, I send a "./mdadm --fail /dev/md/dX /dev/sdX" command to mdadm immediately after detecting this failure.

This is therefore much much much faster than waiting for all the timeouts present in sd/md.

It is non dangerous too since when a disk phy has gone offline, it is physically offline, no need to use any timeouts or retry.

All the above work fine and would solve my problem of speeding up the failing of a disk except that when I send the --fail command immediately after the disk has been removed from the enclosure, the md array seems to loop into a "resync" mode instead of reconfiguring itself as a degraded array.

Here is a copy of dmesg showing the phenomenon:

[  744.631035] ioc2 Event: 0xf
[  745.452060] ioc2 Event: SAS_DISCOVERY
[ 745.465349] ioc2: Phy 13 Handle a sas addr: 0x50015b22300009e0 is now offline
[  745.485209] ioc2 Event: SAS_DISCOVERY
[  745.693492] raid5: Disk failure on sdg2, disabling device.
[  745.693497] raid5: Operation continuing on 4 devices.
[  745.718787] md: recovery of RAID array md_d0
[  745.723047] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 745.728862] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  745.738402] md: using 2048k window, over a total of 9429760 blocks.
[  745.744655] md: resuming recovery of md_d0 from checkpoint.
[  745.756660] md: md_d0: recovery done.
[  745.761900] RAID5 conf printout:
[  745.765148]  --- rd:5 wd:4
[  745.767855]  disk 0, o:1, dev:sdd2
[  745.771243]  disk 1, o:1, dev:sda2
[  745.774634]  disk 2, o:1, dev:sdc2
[  745.778029]  disk 3, o:0, dev:sdg2
[  745.781424]  disk 4, o:1, dev:sdb2
[  745.799202] md: recovery of RAID array md_d0
[  745.803460] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 745.809275] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  745.818814] md: using 2048k window, over a total of 9429760 blocks.
[  745.825066] md: resuming recovery of md_d0 from checkpoint.
[  745.837066] md: md_d0: recovery done.
[  745.842872] RAID5 conf printout:
[  745.846187]  --- rd:5 wd:4
[  745.848890]  disk 0, o:1, dev:sdd2
[  745.852281]  disk 1, o:1, dev:sda2
[  745.855673]  disk 2, o:1, dev:sdc2
[  745.859064]  disk 3, o:0, dev:sdg2
[  745.862459]  disk 4, o:1, dev:sdb2
[  745.919801] md: recovery of RAID array md_d0
[  745.924090] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 745.929949] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  745.939491] md: using 2048k window, over a total of 9429760 blocks.
[  745.945742] md: resuming recovery of md_d0 from checkpoint.
[  745.957745] md: md_d0: recovery done.
[  746.149794] RAID5 conf printout:
[  746.153051]  --- rd:5 wd:4
[  746.155752]  disk 0, o:1, dev:sdd2
[  746.159151]  disk 1, o:1, dev:sda2
[  746.162543]  disk 2, o:1, dev:sdc2
[  746.165939]  disk 3, o:0, dev:sdg2
[  746.169334]  disk 4, o:1, dev:sdb2
[  746.369081] md: cannot remove active disk sdg2 from md_d0 ...
[  746.374866] md: recovery of RAID array md_d0
[  746.379129] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 746.384949] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  746.394489] md: using 2048k window, over a total of 9429760 blocks.
[  746.400794] md: resuming recovery of md_d0 from checkpoint.
[  746.412814] md: md_d0: recovery done.
[  746.436268] md: cannot remove active disk sdg2 from md_d0 ...
[  746.491071] RAID5 conf printout:
[  746.494321]  --- rd:5 wd:4
[  746.497027]  disk 0, o:1, dev:sdd2
[  746.500420]  disk 1, o:1, dev:sda2
[  746.503811]  disk 2, o:1, dev:sdc2
[  746.507202]  disk 3, o:0, dev:sdg2
[  746.510598]  disk 4, o:1, dev:sdb2
[  746.594835] md: recovery of RAID array md_d0
[  746.599097] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 746.604915] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  746.614453] md: using 2048k window, over a total of 9429760 blocks.
[  746.620698] md: resuming recovery of md_d0 from checkpoint.
[  746.632689] md: md_d0: recovery done.
[  746.683892] RAID5 conf printout:
[  746.687118]  --- rd:5 wd:4
[  746.689825]  disk 0, o:1, dev:sdd2
[  746.693224]  disk 1, o:1, dev:sda2
[  746.696617]  disk 2, o:1, dev:sdc2
[  746.700012]  disk 3, o:0, dev:sdg2
[  746.703404]  disk 4, o:1, dev:sdb2
[  746.733530] md: recovery of RAID array md_d0
[  746.737820] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 746.743635] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  746.753171] md: using 2048k window, over a total of 9429760 blocks.
[  746.759419] md: resuming recovery of md_d0 from checkpoint.
[  746.771432] md: md_d0: recovery done.
[  746.821801] RAID5 conf printout:
[  746.825049]  --- rd:5 wd:4
[  746.827754]  disk 0, o:1, dev:sdd2
[  746.831150]  disk 1, o:1, dev:sda2
[  746.834545]  disk 2, o:1, dev:sdc2
[  746.837940]  disk 3, o:0, dev:sdg2
[  746.841336]  disk 4, o:1, dev:sdb2
[  746.853983] md: recovery of RAID array md_d0
[  746.858270] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 746.864088] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  746.873627] md: using 2048k window, over a total of 9429760 blocks.
[  746.879872] md: resuming recovery of md_d0 from checkpoint.
[  746.891885] md: md_d0: recovery done.
[  746.942047] RAID5 conf printout:
[  746.945301]  --- rd:5 wd:4
[  746.948008]  disk 0, o:1, dev:sdd2
[  746.951398]  disk 1, o:1, dev:sda2
[  746.954788]  disk 2, o:1, dev:sdc2
[  746.958183]  disk 3, o:0, dev:sdg2
[  746.961579]  disk 4, o:1, dev:sdb2
[  746.974231] md: recovery of RAID array md_d0
[  746.978521] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 746.984339] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  746.993887] md: using 2048k window, over a total of 9429760 blocks.
[  747.000136] md: resuming recovery of md_d0 from checkpoint.
[  747.012149] md: md_d0: recovery done.
[  747.062726] RAID5 conf printout:
[  747.065976]  --- rd:5 wd:4
[  747.068683]  disk 0, o:1, dev:sdd2
[  747.072075]  disk 1, o:1, dev:sda2
[  747.075468]  disk 2, o:1, dev:sdc2
[  747.078859]  disk 3, o:0, dev:sdg2
[  747.082253]  disk 4, o:1, dev:sdb2
[  747.202204] md: recovery of RAID array md_d0
[  747.206499] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 747.212317] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  747.221855] md: using 2048k window, over a total of 9429760 blocks.
[  747.228107] md: resuming recovery of md_d0 from checkpoint.
[  747.240122] md: md_d0: recovery done.
[  747.291282] RAID5 conf printout:
[  747.294530]  --- rd:5 wd:4
[  747.297238]  disk 0, o:1, dev:sdd2
[  747.300636]  disk 1, o:1, dev:sda2
[  747.304026]  disk 2, o:1, dev:sdc2
[  747.307418]  disk 3, o:0, dev:sdg2


... This continues until precisely 4 seconds, the same time it takes if not sending the command to fail the disk before md does it automatically, at which point the sd device is kicked out and the array becomes degraded, or too many IO errors are detected by md and the same fate happens.

On the other hand, the interesting point is that trying to do the above while also doing read IOs but without physically pulling a disk (just using --fail on a disk that is present, healthy, and running the read IOs too), everything works fine.

It seems that if you decide to send a --fail command to a disk that is currently in the sd or md path for error checking and/or recovery, the fail command will not be ignored but will instead trigger an endless loop of recoveries (in the above case, the recoveries are very fast since we are running read only IOs), and will succeed once the normal error checking has completed and decided to expell or kick the device out.

Would anybody know what it causing this, and if there is a way around it?

The command used to create the RAID is the following:

"mdadm --create -vvv --force --run --metadata=1.2 /dev/md/d0 --level=5 --size=9429760 --chunk=64 --name=test_01 -n5 --bitmap=internal --bitmap-chunk=4096 --layout=ls /dev/sdd2 /dev/sda2 /dev/sdc2 /dev/sdg2 /dev/sdb2"

Thank you very much in advance!

Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux