I have a ubuntu 9.04 system with the default mdadm and kernel (2.6.28).
During the night two drives out of my 6 drive raid5 was kicked out due to
sata timeouts. I rebooted the system and tried to assemble the array, but
it would end up with 2 spares and 4 working drives, thus not enough.
More examination showed that of the two drives not working, one was
"active" (as opposed to clean), and another one had a different event
count from all the other drives.
Doing --assemble --force yielded:
[ 7748.103782] md: bind<sdg>
[ 7748.103989] md: bind<sdb>
[ 7748.104164] md: bind<sdc>
[ 7748.104315] md: bind<sde>
[ 7748.104456] md: bind<sdf>
[ 7748.104631] md: bind<sdd>
[ 7748.104664] md: kicking non-fresh sde from array!
[ 7748.104684] md: unbind<sde>
[ 7748.120532] md: export_rdev(sde)
[ 7748.120554] md: md0: raid array is not clean -- starting background
reconstruction
[ 7748.122135] raid5: device sdd operational as raid disk 0
[ 7748.122153] raid5: device sdf operational as raid disk 5
[ 7748.122169] raid5: device sdc operational as raid disk 3
[ 7748.122186] raid5: device sdb operational as raid disk 2
[ 7748.122202] raid5: device sdg operational as raid disk 1
[ 7748.122218] raid5: cannot start dirty degraded array for md0
[ 7748.122234] RAID5 conf printout:
[ 7748.122248] --- rd:6 wd:5
[ 7748.122261] disk 0, o:1, dev:sdd
[ 7748.122275] disk 1, o:1, dev:sdg
[ 7748.122289] disk 2, o:1, dev:sdb
[ 7748.122303] disk 3, o:1, dev:sdc
[ 7748.122317] disk 5, o:1, dev:sdf
[ 7748.122331] raid5: failed to run raid set md0
[ 7748.122346] md: pers->run() failed ...
# mdadm --examine /dev/sd[bcdefg] | grep -i State
State : clean
Array State : _uUu_u 5 failed
State : clean
Array State : _uuU_u 5 failed
State : active
Array State : Uuuu_u 4 failed
State : clean
Array State : uuuuUu 3 failed
State : clean
Array State : _uuu_U 5 failed
State : clean
Array State : _Uuu_u 5 failed
So sde has another event count, sdd is "active" and thus making the array
drity I guess.
Now, I mucked around a bit with assemble and examine, and then at one time
the array tried to start with only two drives:
[ 8803.797521] raid5: device sdc operational as raid disk 3
[ 8803.797541] raid5: device sdd operational as raid disk 0
[ 8803.797558] raid5: not enough operational devices for md0 (4/6 failed)
[ 8803.797575] RAID5 conf printout:
[ 8803.797589] --- rd:6 wd:2
[ 8803.797602] disk 0, o:1, dev:sdd
[ 8803.797616] disk 3, o:1, dev:sdc
[ 8803.797630] raid5: failed to run raid set md0
[ 8803.797645] md: pers->run() failed ...
I then stopped and started it again two times, and all of a sudden it
assembled correctly and started reconstruction:
[ 8842.040824] md: md0 stopped.
[ 8842.040853] md: unbind<sdc>
[ 8842.056512] md: export_rdev(sdc)
[ 8842.056549] md: unbind<sdd>
[ 8842.068510] md: export_rdev(sdd)
[ 8865.784578] md: md0 stopped.
[ 8867.003573] md: bind<sdg>
[ 8867.003753] md: bind<sdb>
[ 8867.003981] md: bind<sdc>
[ 8867.004148] md: bind<sde>
[ 8867.004291] md: bind<sdf>
[ 8867.004489] md: bind<sdd>
[ 8867.004522] md: kicking non-fresh sde from array!
[ 8867.004541] md: unbind<sde>
[ 8867.020030] md: export_rdev(sde)
[ 8867.020052] md: md0: raid array is not clean -- starting background reconstruction
[ 8867.021633] raid5: device sdd operational as raid disk 0
[ 8867.021651] raid5: device sdf operational as raid disk 5
[ 8867.021667] raid5: device sdc operational as raid disk 3
[ 8867.021683] raid5: device sdb operational as raid disk 2
[ 8867.021699] raid5: device sdg operational as raid disk 1
[ 8867.021715] raid5: cannot start dirty degraded array for md0
[ 8867.021731] RAID5 conf printout:
[ 8867.021745] --- rd:6 wd:5
[ 8867.021759] disk 0, o:1, dev:sdd
[ 8867.021772] disk 1, o:1, dev:sdg
[ 8867.021786] disk 2, o:1, dev:sdb
[ 8867.021800] disk 3, o:1, dev:sdc
[ 8867.021814] disk 5, o:1, dev:sdf
[ 8867.021828] raid5: failed to run raid set md0
[ 8867.021843] md: pers->run() failed ...
[ 9044.443949] md: md0 stopped.
[ 9044.443981] md: unbind<sdd>
[ 9044.452013] md: export_rdev(sdd)
[ 9044.452066] md: unbind<sdf>
[ 9044.464011] md: export_rdev(sdf)
[ 9044.464039] md: unbind<sdc>
[ 9044.476009] md: export_rdev(sdc)
[ 9044.476056] md: unbind<sdb>
[ 9044.492010] md: export_rdev(sdb)
[ 9044.492037] md: unbind<sdg>
[ 9044.504010] md: export_rdev(sdg)
[ 9297.387893] md: bind<sdd>
[ 9301.337867] md: bind<sdc>
[ 9399.256047] md: md0 still in use.
[ 9399.678154] md: array md0 already has disks!
[ 9409.702060] md: md0 stopped.
[ 9409.702087] md: unbind<sdc>
[ 9409.712012] md: export_rdev(sdc)
[ 9409.712062] md: unbind<sdd>
[ 9409.724009] md: export_rdev(sdd)
[ 9411.880427] md: md0 still in use.
[ 9413.518157] md: bind<sdg>
[ 9413.518357] md: bind<sdb>
[ 9413.518527] md: bind<sdc>
[ 9413.518675] md: bind<sde>
[ 9413.518817] md: bind<sdf>
[ 9413.518987] md: bind<sdd>
[ 9413.519019] md: md0: raid array is not clean -- starting background reconstruction
[ 9413.521094] raid5: device sdd operational as raid disk 0
[ 9413.521113] raid5: device sdf operational as raid disk 5
[ 9413.521129] raid5: device sde operational as raid disk 4
[ 9413.521145] raid5: device sdc operational as raid disk 3
[ 9413.521162] raid5: device sdb operational as raid disk 2
[ 9413.521178] raid5: device sdg operational as raid disk 1
[ 9413.521859] raid5: allocated 6396kB for md0
[ 9413.521875] raid5: raid level 5 set md0 active with 6 out of 6 devices, algorithm 2
[ 9413.521901] RAID5 conf printout:
[ 9413.521915] --- rd:6 wd:6
[ 9413.521928] disk 0, o:1, dev:sdd
[ 9413.521942] disk 1, o:1, dev:sdg
[ 9413.521956] disk 2, o:1, dev:sdb
[ 9413.521970] disk 3, o:1, dev:sdc
[ 9413.521984] disk 4, o:1, dev:sde
[ 9413.521998] disk 5, o:1, dev:sdf
[ 9413.522145] md0: detected capacity change from 0 to 10001993891840
Anyone have any idea what's going on?
--
Mikael Abrahamsson email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html