[Looks like my first messge didn't made it to the list, hence send again with tarballed attachments] Dear list members, one of the systems, I take care of, there's one pretty bog standard openSUSE 12.1 installation, that stick out with continued device failures on boot: Here a typical case: ~# cat /proc/mdstat Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] md3 : active raid1 sda4[0] 869702736 blocks super 1.0 [2/1] [U_] bitmap: 57/415 pages [228KB], 1024KB chunk md0 : active raid1 sda1[0] 96376 blocks super 1.0 [2/1] [U_] bitmap: 1/6 pages [4KB], 8KB chunk md1 : active (auto-read-only) raid1 sdb2[1] sda2[0] 2096468 blocks super 1.0 [2/2] [UU] bitmap: 0/8 pages [0KB], 128KB chunk md124 : active raid1 sdb3[1] sda3[0] 104856180 blocks super 1.0 [2/2] [UU] bitmap: 8/200 pages [32KB], 256KB chunk [no line breaks on purpose] Jan 29 20:22:36 zaphkiel kernel: [ 11.047504] md: raid1 personality registered for level 1 Jan 29 20:22:36 zaphkiel kernel: [ 11.549612] md: bind<sda3> Jan 29 20:22:36 zaphkiel kernel: [ 11.587037] md: bind<sdb3> Jan 29 20:22:36 zaphkiel kernel: [ 11.630965] md/raid1:md124: active with 2 out of 2 mirrors Jan 29 20:22:36 zaphkiel kernel: [ 11.708396] md124: bitmap initialized from disk: read 13/13 pages, set 1 of 409595 bits Jan 29 20:22:36 zaphkiel kernel: [ 11.769213] md124: detected capacity change from 0 to 107372728320 Jan 29 20:22:36 zaphkiel kernel: [ 11.981192] md: raid0 personality registered for level 0 Jan 29 20:22:36 zaphkiel kernel: [ 12.020959] md: raid10 personality registered for level 10 Jan 29 20:22:36 zaphkiel kernel: [ 12.625530] md: raid6 personality registered for level 6 Jan 29 20:22:36 zaphkiel kernel: [ 12.657414] md: raid5 personality registered for level 5 Jan 29 20:22:36 zaphkiel kernel: [ 12.689261] md: raid4 personality registered for level 4 Jan 29 20:22:36 zaphkiel kernel: [ 25.151590] md: bind<sda2> Jan 29 20:22:36 zaphkiel kernel: [ 25.314284] md: bind<sda1> Jan 29 20:22:36 zaphkiel kernel: [ 25.409503] md: bind<sda4> Jan 29 20:22:36 zaphkiel kernel: [ 25.568103] md/raid1:md0: active with 1 out of 2 mirrors Jan 29 20:22:36 zaphkiel kernel: [ 25.689110] md: bind<sdb2> Jan 29 20:22:36 zaphkiel kernel: [ 25.713385] md0: bitmap initialized from disk: read 1/1 pages, set 0 of 12047 bits Jan 29 20:22:36 zaphkiel kernel: [ 25.837207] md0: detected capacity change from 0 to 98689024 Jan 29 20:22:36 zaphkiel kernel: [ 26.045361] md/raid1:md1: active with 2 out of 2 mirrors Jan 29 20:22:36 zaphkiel kernel: [ 26.260500] md1: bitmap initialized from disk: read 1/1 pages, set 0 of 16379 bits Jan 29 20:22:36 zaphkiel kernel: [ 26.349129] md1: detected capacity change from 0 to 2146783232 Jan 29 20:22:36 zaphkiel kernel: [ 26.391526] md/raid1:md3: active with 1 out of 2 mirrors Jan 29 20:22:36 zaphkiel kernel: [ 27.188346] md3: bitmap initialized from disk: read 26/26 pages, set 1547 of 849320 bits Jan 29 20:22:36 zaphkiel kernel: [ 27.302622] md3: detected capacity change from 0 to 890575601664 This looks like some kind of race during device detection. The full boot sequence log leading to this mess is attached. The major parts operating here are: mdadm-3.2.2-4.9.1.i586 mkinitrd-2.7.0-39.3.1.i586 kernel-desktop-3.1.10-1.16.1.i586 kernel-desktop-base-3.1.10-1.16.1.i586 Sure the system can be repaired with: mdadm --add /dev/md0 /dev/sdb1 mdadm --add /dev/md3 /dev/sdb4 for this case, but the behavior which partition is affected is random, only md124 seems stable (the root fs). The strange md naming was the result of an upgrade installation. The device details are attached as well. It happens, that the active device even *switches* between boots, which is a perfect recipe for actually loosing data, hence this md doesn't raise data security, it is the reason for loosing them. Could some kind soul tell me, what's going on here? Thanks in advance, Pete
Attachment:
details-and-log.tar.bz2
Description: application/bzip-compressed-tar