raid5 not responding (LONG)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

(I just subscribed and I'm not really sure if this list is for user support,
sorry if it isn't, please direct me to the appropiate resource. Also, sorry
for the long post, but I'm trying to give all pertinent info at once).

Here I have a small ProLiant with three 9.1GB SCSI disks arranged in raid5,
and subsequently shared via samba for a dozen windows users. Today it
stopped to respond to everything, even an 'ls /mnt/u01' (there it is
mounted) does not come back at all! From system logs, it looks like a disk
has failed, but I do not know how to restore proper operation. It hasn't
come back online yet, more than one hour after initial failure.

Please, any advice to start troubleshooting this one will be very much
appreciated.

Follows relevant info:

[root@edrs log]# uname -a
Linux edrs 2.4.18-27.7.x #1 Fri Mar 14 06:44:53 EST 2003 i686 unknown

[root@edrs root]# uptime
  8:36pm  up 14 days,  5:58,  5 users,  load average: 56.99, 56.61, 54.43

[root@edrs log]# cat /etc/redhat-release
Red Hat Linux release 7.3 (Valhalla)

[root@edrs log]# rpm -q raidtools
raidtools-1.00.2-1.3

[root@edrs log]# cat /etc/raidtab
raiddev /dev/md0
        raid-level      5
        nr-raid-disks   3
        nr-spare-disks  0
        persistent-superblock   1
        parity-algorithm        left-symmetric
        chunk-size      32
        device          /dev/sdb1
        raid-disk       0
        device          /dev/sdc1
        raid-disk       1
        device          /dev/sdd1
        raid-disk       2


[root@edrs log]# cat /etc/fstab
/dev/sda8               /                       ext3    defaults        1 1
/dev/sda1               /boot                   ext3    defaults        1 2
/dev/sda6               /home                   ext3    defaults        1 2
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,ro 0 0
/dev/sda5               /usr                    ext3    defaults        1 2
/dev/sda7               /var                    ext3    defaults        1 2
/dev/md0                /mnt/u01                ext3    defaults        1 2
/dev/fd0                /mnt/floppy             auto    noauto,owner    0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/sda9               swap                    swap    defaults        0 0
/var/SWAP               swap                    swap    defaults        0 0

[root@edrs log]# cat /etc/mtab
/dev/sda8 / ext3 rw 0 0
none /proc proc rw 0 0
/dev/sda1 /boot ext3 rw 0 0
/dev/sda6 /home ext3 rw 0 0
/dev/sda5 /usr ext3 rw 0 0
/dev/sda7 /var ext3 rw 0 0
/dev/md0 /mnt/u01 ext3 rw 0 0
none /dev/shm tmpfs rw 0 0
none /dev/pts devpts rw,gid=5,mode=620 0 0


[root@edrs log]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdd1[2] sdc1[1] sdb1[0](F)
      17767680 blocks level 5, 32k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>



[root@edrs log]# tail -20 /var/log/messages
Aug 11 18:34:57 edrs kernel: scsi0: ERROR on channel 0, id 1, lun 0, CDB:
Read (10) 00 00 03 c1 2f 00 00 40 00
Aug 11 18:34:57 edrs kernel: Info fld=0x3c134, Current sd08:11: sense key
Medium Error
Aug 11 18:34:57 edrs kernel: Additional sense indicates Unrecovered read
error - recommend reassignment
Aug 11 18:34:57 edrs kernel:  I/O error: dev 08:11, sector 246000
Aug 11 18:34:57 edrs kernel: raid5: Disk failure on sdb1, disabling device.
Operation continuing on 2 devices
Aug 11 18:34:57 edrs kernel: md: updating md0 RAID superblock on device
Aug 11 18:34:57 edrs kernel: md: sdd1 [events: 00000038]<6>(write) sdd1's sb
offset: 8883840
Aug 11 18:34:57 edrs kernel: md: recovery thread got woken up ...
Aug 11 18:34:57 edrs kernel: md0: no spare disk to reconstruct array! --
continuing in degraded mode
Aug 11 18:34:57 edrs kernel: md: recovery thread finished ...
Aug 11 19:01:00 edrs CROND[18309]: (root) CMD (run-parts /etc/cron.hourly)
Aug 11 19:01:48 edrs sshd(pam_unix)[18318]: session opened for user root by
(uid=0)
Aug 11 19:02:36 edrs sshd(pam_unix)[18359]: session opened for user root by
(uid=0)
Aug 11 19:03:42 edrs sshd(pam_unix)[18402]: session opened for user root by
(uid=0)
Aug 11 19:17:32 edrs kernel: lease timed out
Aug 11 20:01:00 edrs CROND[18539]: (root) CMD (run-parts /etc/cron.hourly)
Aug 11 20:01:03 edrs sshd(pam_unix)[18537]: session opened for user root by
(uid=0)
Aug 11 20:06:15 edrs kernel: 10.65.44.10 sent an invalid ICMP error to a
broadcast.
Aug 11 20:06:15 edrs kernel: 10.70.44.10 sent an invalid ICMP error to a
broadcast.
Aug 11 20:18:15 edrs kernel: 10.70.44.9 sent an invalid ICMP error to a
broadcast.





Looking back for the last boot, I found:

[root@edrs root]# cat /var/log/messages.2
...<snip>...
Jul 28 14:39:01 edrs fsck: /dev/md0: recovering journal
Jul 28 14:39:08 edrs fsck: /dev/md0: clean, 20402/2223872 files,
2441914/4441920 blocks
Jul 28 14:39:08 edrs rc.sysinit: Checking filesystems succeeded
Jul 28 14:39:10 edrs rc.sysinit: Mounting local filesystems:  succeeded
...<snip>...
Jul 28 14:40:22 edrs kernel: md: md driver 0.90.0 MAX_MD_DEVS=256,
MD_SB_DISKS=27
Jul 28 14:40:22 edrs kernel: md: Autodetecting RAID arrays.
Jul 28 14:40:22 edrs kernel: md: autorun ...
Jul 28 14:40:23 edrs kernel: md: ... autorun DONE.
Jul 28 14:40:25 edrs kernel: SCSI subsystem driver Revision: 1.00
Jul 28 14:40:26 edrs kernel: kmod: failed to exec /sbin/modprobe -s -k
scsi_hostadapter, errno = 2
Jul 28 14:40:26 edrs kernel: ncr53c8xx: at PCI bus 1, device 9, function 0
Jul 28 14:40:26 edrs kernel: ncr53c8xx: 53c875 detected
Jul 28 14:40:26 edrs kernel: ncr53c8xx: at PCI bus 1, device 9, function 1
Jul 28 14:40:26 edrs kernel: ncr53c8xx: 53c875 detected
Jul 28 14:40:26 edrs kernel: ncr53c875-0: rev 0x14 on pci bus 1 device 9
function 0 irq 9
Jul 28 14:40:26 edrs kernel: ncr53c875-0: ID 7, Fast-20, Parity Checking
Jul 28 14:40:27 edrs kernel: ncr53c875-1: rev 0x14 on pci bus 1 device 9
function 1 irq 10
Jul 28 14:40:27 edrs kernel: ncr53c875-1: ID 7, Fast-20, Parity Checking
Jul 28 14:40:27 edrs kernel: scsi0 : ncr53c8xx-3.4.3b-20010512
Jul 28 14:40:27 edrs kernel: scsi1 : ncr53c8xx-3.4.3b-20010512
Jul 28 14:40:28 edrs kernel: ncr53c875-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Jul 28 14:40:28 edrs kernel:   Vendor: COMPAQ    Model: MAB3045SC
Rev: 0814
Jul 28 14:40:29 edrs kernel:   Type:   Direct-Access
ANSI SCSI revision: 02
Jul 28 14:40:29 edrs kernel: ncr53c875-0-<1,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 15)
Jul 28 14:40:29 edrs kernel:   Vendor: COMPAQ    Model: DGHS09Y
Rev: 01C0
Jul 28 14:40:29 edrs kernel:   Type:   Direct-Access
ANSI SCSI revision: 03
Jul 28 14:40:29 edrs kernel: ncr53c875-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Jul 28 14:40:29 edrs kernel:   Vendor: COMPAQ    Model: MAB3091SC
Rev: 0814
Jul 28 14:40:29 edrs kernel:   Type:   Direct-Access
ANSI SCSI revision: 02
Jul 28 14:40:29 edrs kernel: ncr53c875-0-<3,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 15)
Jul 28 14:40:29 edrs kernel:   Vendor: COMPAQ    Model: DGHS09Y
Rev: 01C0
Jul 28 14:40:29 edrs kernel:   Type:   Direct-Access
ANSI SCSI revision: 03
Jul 28 14:40:29 edrs kernel: Attached scsi disk sda at scsi0, channel 0, id
0, lun 0
Jul 28 14:40:29 edrs kernel: Attached scsi disk sdb at scsi0, channel 0, id
1, lun 0
Jul 28 14:40:30 edrs kernel: Attached scsi disk sdc at scsi0, channel 0, id
2, lun 0
Jul 28 14:40:30 edrs kernel: Attached scsi disk sdd at scsi0, channel 0, id
3, lun 0
Jul 28 14:40:30 edrs kernel: SCSI device sda: 8386000 512-byte hdwr sectors
(4294 MB)
Jul 28 14:40:30 edrs kernel: Partition check:
Jul 28 14:40:30 edrs kernel:  sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 >
Jul 28 14:40:30 edrs kernel: SCSI device sdb: 17773500 512-byte hdwr sectors
(9100 MB)
Jul 28 14:40:30 edrs kernel:  sdb: sdb1
Jul 28 14:40:30 edrs kernel: SCSI device sdc: 17773500 512-byte hdwr sectors
(9100 MB)
Jul 28 14:40:30 edrs kernel:  sdc: sdc1
Jul 28 14:40:31 edrs kernel: SCSI device sdd: 17773500 512-byte hdwr sectors
(9100 MB)
Jul 28 14:40:31 edrs kernel:  sdd: sdd1
Jul 28 14:40:32 edrs kernel: raid5: measuring checksumming speed
Jul 28 14:40:32 edrs kernel:    8regs     :   733.184 MB/sec
Jul 28 14:40:32 edrs kernel:    32regs    :   346.112 MB/sec
Jul 28 14:40:32 edrs kernel:    pII_mmx   :   894.976 MB/sec
Jul 28 14:40:32 edrs kernel:    p5_mmx    :   933.888 MB/sec
Jul 28 14:40:33 edrs kernel: raid5: using function: p5_mmx (933.888 MB/sec)
Jul 28 14:40:33 edrs kernel: md: raid5 personality registered as nr 4
Jul 28 14:40:33 edrs kernel: Journalled Block Device driver loaded
Jul 28 14:40:33 edrs kernel: md: Autodetecting RAID arrays.
Jul 28 14:40:33 edrs kernel:  [events: 00000036]
Jul 28 14:40:34 edrs kernel:  [events: 00000036]
Jul 28 14:40:34 edrs kernel:  [events: 00000036]
Jul 28 14:40:34 edrs kernel: md: autorun ...
Jul 28 14:40:35 edrs kernel: md: considering sdd1 ...
Jul 28 14:40:35 edrs kernel: md:  adding sdd1 ...
Jul 28 14:40:35 edrs kernel: md:  adding sdc1 ...
Jul 28 14:40:36 edrs kernel: md:  adding sdb1 ...
Jul 28 14:40:36 edrs kernel: md: created md0
Jul 28 14:40:36 edrs kernel: md: bind<sdb1,1>
Jul 28 14:40:37 edrs kernel: md: bind<sdc1,2>
Jul 28 14:40:37 edrs kernel: md: bind<sdd1,3>
Jul 28 14:40:37 edrs kernel: md: running: <sdd1><sdc1><sdb1>
Jul 28 14:40:37 edrs kernel: md: sdd1's event counter: 00000036
Jul 28 14:40:38 edrs kernel: md: sdc1's event counter: 00000036
Jul 28 14:40:38 edrs kernel: md: sdb1's event counter: 00000036
Jul 28 14:40:39 edrs kernel: md: md0: raid array is not clean -- starting
background reconstruction
Jul 28 14:40:39 edrs kernel: md0: max total readahead window set to 496k
Jul 28 14:40:39 edrs kernel: md0: 2 data-disks, max readahead per data-disk:
248k
Jul 28 14:40:40 edrs kernel: raid5: device sdd1 operational as raid disk 2
Jul 28 14:40:40 edrs kernel: raid5: device sdc1 operational as raid disk 1
Jul 28 14:40:40 edrs kernel: raid5: device sdb1 operational as raid disk 0
Jul 28 14:40:40 edrs kernel: raid5: allocated 3291kB for md0
Jul 28 14:40:41 edrs kernel: raid5: raid level 5 set md0 active with 3 out
of 3 devices, algorithm 2
Jul 28 14:40:41 edrs kernel: raid5: raid set md0 not clean; reconstructing
parity
Jul 28 14:40:41 edrs kernel: RAID5 conf printout:
Jul 28 14:40:42 edrs kernel:  --- rd:3 wd:3 fd:0
Jul 28 14:40:42 edrs kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb1
Jul 28 14:40:42 edrs kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc1
Jul 28 14:40:43 edrs kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd1
Jul 28 14:40:43 edrs kernel: RAID5 conf printout:
Jul 28 14:40:44 edrs kernel:  --- rd:3 wd:3 fd:0
Jul 28 14:40:44 edrs kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb1
Jul 28 14:40:44 edrs kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc1
Jul 28 14:40:45 edrs kernel:  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd1
Jul 28 14:40:45 edrs kernel: md: updating md0 RAID superblock on device
Jul 28 14:40:45 edrs kernel: md: sdd1 [events: 00000037]<6>(write) sdd1's sb
offset: 8883840
Jul 28 14:40:45 edrs kernel: md: syncing RAID array md0
Jul 28 14:40:45 edrs kernel: md: minimum _guaranteed_ reconstruction speed:
100 KB/sec/disc.
Jul 28 14:40:45 edrs kernel: md: using maximum available idle IO bandwith
(but not more than 10000 KB/sec) for reconstruction.
Jul 28 14:40:45 edrs kernel: md: using 508k window, over a total of 8883840
blocks.
Jul 28 14:40:45 edrs kernel: md: sdc1 [events: 00000037]<6>(write) sdc1's sb
offset: 8883840
Jul 28 14:40:46 edrs kernel: md: sdb1 [events: 00000037]<6>(write) sdb1's sb
offset: 8883840
Jul 28 14:40:46 edrs kernel: md: ... autorun DONE.
Jul 28 14:40:46 edrs kernel: EXT3-fs: INFO: recovery required on readonly
filesystem.
Jul 28 14:40:46 edrs kernel: EXT3-fs: write access will be enabled during
recovery.
Jul 28 14:40:46 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:46 edrs kernel: EXT3-fs: recovery complete.
Jul 28 14:40:46 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jul 28 14:40:46 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,8),
internal journal
Jul 28 14:40:47 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:47 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,1),
internal journal
Jul 28 14:40:47 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jul 28 14:40:47 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:47 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,6),
internal journal
Jul 28 14:40:47 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jul 28 14:40:47 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:47 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,5),
internal journal
Jul 28 14:40:47 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jul 28 14:40:48 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:48 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,7),
internal journal
Jul 28 14:40:48 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Jul 28 14:40:48 edrs kernel: kjournald starting.  Commit interval 5 seconds
Jul 28 14:40:48 edrs kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on md(9,0),
internal journal
Jul 28 14:40:48 edrs kernel: EXT3-fs: mounted filesystem with ordered data
mode.
...<snip>...
Jul 28 14:53:31 edrs kernel: md: md0: sync done.
Jul 28 14:53:31 edrs kernel: raid5: resync finished.



thanks in advance
cl.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux