RAID reconstruction problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I currently have two small Software RAIDs, a RAID 1 for my root
partition, and a RAID 5 for my usr partition.  One of the disks in the
arrays died, and I threw in a new disk in with the intention of
rebuilding the arrays.

The rebuilds failed, but in an extremely strange fashion.  Monitoring
/proc/mdstat, it seems that the rebuilds are going just fine.  When
they finish however, /proc/mdstat includes the new disk, but also
declares it invalid.  The system continues running in degraded mode.

When I run this from the root console, I get some messages from the
raid subsystem, including full debugging output.  I have not yet
figured out how to capture this output in order to include in this
message, but I did write down a part of one attempt (this was by hand,
so there may be small inconsistancies):

RAID5 conf printout
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
md: bug in file raid5.c, line 1901

Here is some output from my system.  If any more information would be
useful, or anyone thinks I should try something else, please let me
know.  I would like to get out of my currently degraded state!

maru:/# uname -a
Linux maru 2.4.21 #3 Fri Aug 29 13:14:01 EDT 2003 i686 GNU/Linux
maru:/# cat ~md5i/dmesg-raid
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs     :  1841.200 MB/sec
   32regs    :   935.600 MB/sec
   pIII_sse  :  2052.000 MB/sec
   pII_mmx   :  2247.600 MB/sec
   p5_mmx    :  2383.200 MB/sec
raid5: using function: pIII_sse (2052.000 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 00000198]
 [events: 00000008]
 [events: 00000196]
 [events: 000000f3]
 [events: 00000086]
 [events: 00000008]
md: autorun ...
md: considering ide/host2/bus1/target0/lun0/part3 ...
md:  adding ide/host2/bus1/target0/lun0/part3 ...
md:  adding ide/host0/bus0/target1/lun0/part3 ...
md: created md0
md: bind<ide/host0/bus0/target1/lun0/part3,1>
md: bind<ide/host2/bus1/target0/lun0/part3,2>
md: running: <ide/host2/bus1/target0/lun0/part3><ide/host0/bus0/target1/lun0/part3>
md: ide/host2/bus1/target0/lun0/part3's event counter: 00000008
md: ide/host0/bus0/target1/lun0/part3's event counter: 00000008
md0: max total readahead window set to 496k
md0: 2 data-disks, max readahead per data-disk: 248k
raid5: device ide/host2/bus1/target0/lun0/part3 operational as raid disk 2
raid5: device ide/host0/bus0/target1/lun0/part3 operational as raid disk 0
raid5: md0, not all disks are operational -- trying to recover array
raid5: allocated 3284kB for md0
raid5: raid level 5 set md0 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
md: updating md0 RAID superblock on device
md: ide/host2/bus1/target0/lun0/part3 [events: 00000009]<6>(write) ide/host2/bus1/target0/lun0/part3's sb offset: 53640960
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ide/host0/bus0/target1/lun0/part3 [events: 00000009]<6>(write) ide/host0/bus0/target1/lun0/part3's sb offset: 53616832
md: considering ide/host2/bus1/target0/lun0/part1 ...
md:  adding ide/host2/bus1/target0/lun0/part1 ...
md:  adding ide/host0/bus1/target0/lun0/part1 ...
md:  adding ide/host0/bus0/target1/lun0/part1 ...
md: created md1
md: bind<ide/host0/bus0/target1/lun0/part1,1>
md: bind<ide/host0/bus1/target0/lun0/part1,2>
md: bind<ide/host2/bus1/target0/lun0/part1,3>
md: running: <ide/host2/bus1/target0/lun0/part1><ide/host0/bus1/target0/lun0/part1><ide/host0/bus0/target1/lun0/part1>
md: ide/host2/bus1/target0/lun0/part1's event counter: 00000086
md: ide/host0/bus1/target0/lun0/part1's event counter: 00000196
md: ide/host0/bus0/target1/lun0/part1's event counter: 00000198
md: superblock update time inconsistency -- using the most recent one
md: freshest: ide/host0/bus0/target1/lun0/part1
md: kicking non-fresh ide/host2/bus1/target0/lun0/part1 from array!
md: unbind<ide/host2/bus1/target0/lun0/part1,2>
md: export_rdev(ide/host2/bus1/target0/lun0/part1)
md: kicking non-fresh ide/host0/bus1/target0/lun0/part1 from array!
md: unbind<ide/host0/bus1/target0/lun0/part1,1>
md: export_rdev(ide/host0/bus1/target0/lun0/part1)
md1: removing former faulty ide/host0/bus1/target0/lun0/part1!
md: RAID level 1 does not need chunksize! Continuing anyway.
md1: max total readahead window set to 124k
md1: 1 data-disks, max readahead per data-disk: 124k
raid1: device ide/host0/bus0/target1/lun0/part1 operational as mirror 0
raid1: md1, not all disks are operational -- trying to recover array
raid1: raid set md1 active with 1 out of 2 mirrors
md: updating md1 RAID superblock on device
md: ide/host0/bus0/target1/lun0/part1 [events: 00000199]<6>(write) ide/host0/bus0/target1/lun0/part1's sb offset: 6144704
md: recovery thread got woken up ...
md1: no spare disk to reconstruct array! -- continuing in degraded mode
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: considering ide/host0/bus1/target0/lun0/part3 ...
md:  adding ide/host0/bus1/target0/lun0/part3 ...
md: md0 already running, cannot run ide/host0/bus1/target0/lun0/part3
md: export_rdev(ide/host0/bus1/target0/lun0/part3)
md: (ide/host0/bus1/target0/lun0/part3 was pending)
md: ... autorun DONE.
maru:/# cat /proc/mdstat 
Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md1 : active raid1 ide/host0/bus0/target1/lun0/part1[0]
      6144704 blocks [2/1] [U_]
      
md0 : active raid5 ide/host2/bus1/target0/lun0/part3[2] ide/host0/bus0/target1/lun0/part3[0]
      107233664 blocks level 5, 32k chunk, algorithm 2 [3/2] [U_U]
      
unused devices: <none>
maru:/# lsraid -A -a /dev/md0
[dev   9,   0] /dev/md0         94BF0D82.2B9C1BFB.89401B38.92B8F93B online
[dev   3,  67] /dev/ide/host0/bus0/target1/lun0/part3 94BF0D82.2B9C1BFB.89401B38.92B8F93B good
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev  34,   3] /dev/ide/host2/bus1/target0/lun0/part3 94BF0D82.2B9C1BFB.89401B38.92B8F93B good

maru:/# lsraid -A -a /dev/md1
[dev   9,   1] /dev/md1         0E953226.03C91D46.CD00D52F.83A1334E online
[dev   3,  65] /dev/ide/host0/bus0/target1/lun0/part1 0E953226.03C91D46.CD00D52F.83A1334E good
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing

maru:/# cat /etc/raidtab
raiddev /dev/md0
        raid-level      5
        nr-raid-disks   3
        nr-spare-disks  0
        persistent-superblock 1
        parity-algorithm        left-symmetric
        chunk-size      32
        device          /dev/hdb3
        raid-disk       0
        device          /dev/hdc3
        raid-disk       1
        device		/dev/hdg3
        raid-disk       2

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  1
        persistent-superblock 1
        chunk-size      4
        device          /dev/hdb1
        raid-disk       0
        device          /dev/hdc1
        raid-disk       1
	device		/dev/hdg1
	spare-disk	0
maru:/# ls -l /dev/hdb1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdb1 -> ide/host0/bus0/target1/lun0/part1
maru:/# ls -l /dev/hdc1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdc1 -> ide/host0/bus1/target0/lun0/part1
maru:/# ls -l /dev/hdg1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdg1 -> ide/host2/bus1/target0/lun0/part1
maru:/# ls -l /dev/hdb3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdb3 -> ide/host0/bus0/target1/lun0/part3
maru:/# ls -l /dev/hdc3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdc3 -> ide/host0/bus1/target0/lun0/part3
maru:/# ls -l /dev/hdg3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdg3 -> ide/host2/bus1/target0/lun0/part3
maru:/# raidhotadd /dev/md1 /dev/hdc1
maru:/# echo Waited for some time...
Waited for some time...
maru:/# cat /proc/mdstat
Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md1 : active raid1 ide/host0/bus1/target0/lun0/part1[2] ide/host0/bus0/target1/lun0/part1[0]
      6144704 blocks [2/1] [U_]
      
md0 : active raid5 ide/host2/bus1/target0/lun0/part3[2] ide/host0/bus0/target1/lun0/part3[0]
      107233664 blocks level 5, 32k chunk, algorithm 2 [3/2] [U_U]
      
unused devices: <none>
maru:/# 
-- 
Michael Welsh Duggan
(md5i@cs.cmu.edu)

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux