Re: Raid6 array crashed-- 4-disk failure...(?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil Brown wrote:
On Monday September 15, maarten@xxxxxxxxxxxx wrote:
This weekend I promoted my new 6-disk raid6 array to production use and was busy copying data to it overnight. The next morning the machine had crashed, and the array is down with an (apparent?) 4-disk failure, as witnessed by this info:

Pity about that crash.  I don't suppose there are any useful kernel
logs leading up to it.  Maybe the machine needs more burn-in testing
before going into production?

The thing is, I tested the array for months on a new install that was running on spare hardware. Then this weekend I swapped the new OS together with the new disks to the fileserver. The fileserver was running well on the old OS. So indeed, maybe there is a mismatch between the new kernel and the hardware... But I did test-drive the raid-6 code for a couple of months.

md5 : inactive sdj1[2](S) sdb1[5](S) sda1[4](S) sdf1[3](S) sdc1[1](S) sdk1[0](S)
       2925435648 blocks

That suggests that the kernel tried to assemble the array, but failed
because it was too degraded.

apoc ~ # mdadm --assemble /dev/md5 /dev/sd[abcfjk]1
mdadm: /dev/md5 assembled from 2 drives - not enough to start the array.

apoc log # fdisk -l|grep 4875727
/dev/sda1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdb1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdc1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdf1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdj1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdk1        1       60700   487572718+  fd  Linux raid autodetect

apoc log # mdadm --examine /dev/sd[abcfjk]1|grep Events
          Events : 0.1057345
          Events : 0.1057343
          Events : 0.1057343
          Events : 0.1057343
          Events : 0.1057345
          Events : 0.1057343


So sda1 and sdj1 are newer, but not by much.
Looking at the full --examine output below, the time difference
between 1057343 and 1057345 is 61 seconds.  That is probably one or
two device timeouts.

Ah. How can you tell, I did not know this...

'a' and 'j' think that 'k' failed and was removed.  Everyone else
think that the world is a happy place.

So I suspect that an IO to k failed, and the attempt to update the
metadata worked on 'a' and 'j' but not anywhere else.  So then the
array just stopped.  When md tried to update 'a' and 'j' with the new
failure information, it failed on them as well.

Note: the array was built half-degraded, ie. it misses one disk. This is how it was displayed when it was still OK yesterday:

md5 : active raid6 sdk1[0] sdj1[2] sdf1[3] sdc1[1] sdb1[5] sda1[4]
       2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]


By these event counters, one would maybe assume that 4 disks failed simultaneously, however weird this may be. But when looking at the other info of the examine command, this seems unlikely: all drives report (I think) that they were online until the end, except for two drives. The first drive of those two is the one that reports it has failed. The second is the one that 'sees' that that first drive did fail. All the others seem oblivious to that... I included that data below at the end.

Not quite.  'k' is reported as failed, 'a' 'and 'j' know this.


My questions...

1) Is my analysis correct so far ?

Not exactly, but fairly close.

2) Can/should I try to assemble --force, or it that very bad in these circumstances?

Yes, you should assemble with --force.  The evidence is strong that
nothing was successfully written after 'k' failed, so all the data
should be consistent.  You will need to sit through a recovery with
probably won't make any changes, but it is certainly safest to let it
try.


3) Should I say farewell to my ~2400 GB of data ? :-(

Not yet.

4) If it was only a one-drive failure, why did it kill the array ?

It wasn't just one drive.  Maybe it was a controller/connector
failure.  Maybe when one drive failed it did bad things to the buss.
It is hard to know for sure.
Are these drives SATA or SCSI or SAS or ???

Eh, SATA. The machine has 4 4-port SATA controllers on 33MHz PCI busses.
Yes, that kills performance, but what can you do. It still outperforms the network. Re-seating the PCI cards may be a good idea. However, I think (am sure) the drives were not on the same controllers: a thru d are on card #1, e thru h on the second card, etc.

5) Any insight as to how this happened / can be prevented in future ?

See above.
You need to identify the failing component and correct it - either
replace or re-seat or whatever is needed.
Finding the failing component is not easy.   Lots of burn-in testing
and catching any kernel logs if/when it crashes is your best bet.

Ok, I'll read up on using the MagicSysRQ, too. The logs were completely empty at the time of the crash and the keyboard was unresponsive, so it was a full kernel panic.

Good luck.

Thanks for your help Neil !

Maarten

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux