Hi Neil
What would be interesting to see is the --examine output and the dmesg just as the recovery after the add has completed. i.e. just before the reboot. The dmesg you have included is after the reboot. It confirms that sdb5 is non-refresh, presumably the event count is behind for some reason (as can be seen from the --examine output you send in the first email). However it doesn't contain any hint as to why. NeilBrown
OK, after the resync completed, the disk is marked as faulty. Also, there are bundles of errors reported by dmesg, and the other partition on the drive which was ok is unreadable. So your earlier thought that there were IO errors was correct. I will now try some system rebuilding! FYI, the various outputs are appended. Thanks for your help Jon B nas:~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda5[0] sdb5[4](F) sdd5[3] sdc5[2] 733142016 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU] unused devices: <none> nas:~ # mdadm -E /dev/sda5 /dev/sda5: Magic : a92b4efc Version : 00.90.03 UUID : b54e46e1:b6a6e6ea:3ae5a5a5:04e207e4 Creation Time : Fri Aug 4 22:42:14 2006 Raid Level : raid5 Used Dev Size : 244380672 (233.06 GiB 250.25 GB) Array Size : 733142016 (699.18 GiB 750.74 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Fri Jun 20 13:05:54 2008 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : f11d55f5 - correct Events : 0.3796224 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 5 0 active sync /dev/sda5 0 0 8 5 0 active sync /dev/sda5 1 1 0 0 1 faulty removed 2 2 8 37 2 active sync /dev/sdc5 3 3 8 53 3 active sync /dev/sdd5 mdadm -E /dev/sdb5 mdadm: No md superblock detected on /dev/sdb5. ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata4.00: BMDMA stat 0x24 ata4.00: cmd 35/00:30:9a:e7:63/00:02:1b:00:00/e0 tag 0 cdb 0x0 data 286720 out res 61/04:01:e3:e8:63/04:00:1b:00:00/e0 Emask 0x1 (device error) ata4.00: failed to set xfermode (err_mask=0x1) ata4: failed to recover some devices, retrying in 5 secs Marking TSC unstable due to: cpufreq changes. Time: acpi_pm clocksource has been installed. Clocksource tsc unstable (delta = -163018120 ns) ata4: soft resetting link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: failed to set xfermode (err_mask=0x1) ata4: limiting SATA link speed to 1.5 Gbps ata4.00: limiting speed to UDMA/133:PIO3 ata4: failed to recover some devices, retrying in 5 secs ata4: hard resetting link ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata4.00: failed to set xfermode (err_mask=0x1) ata4.00: disabled ata4: EH complete sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 459532186 raid5: Disk failure on sdb5, disabling device. Operation continuing on 3 devices sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 459532746 sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 459533570 sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 883198 Buffer I/O error on device sdb2, logical block 98351 lost page write due to I/O error on sdb2 sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 1049150 Buffer I/O error on device sdb2, logical block 119095 lost page write due to I/O error on sdb2 Aborting journal on device sdb2. journal commit I/O error ext3_abort called. EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only sd 3:0:0:0: [sdb] READ CAPACITY failed sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK sd 3:0:0:0: [sdb] Sense not available. sd 3:0:0:0: [sdb] Write Protect is off sd 3:0:0:0: [sdb] Mode Sense: 00 00 00 00 sd 3:0:0:0: [sdb] Asking for cache data failed sd 3:0:0:0: [sdb] Assuming drive cache: write through md: md0: recovery done. RAID5 conf printout: --- rd:4 wd:3 disk 0, o:1, dev:sda5 disk 1, o:0, dev:sdb5 disk 2, o:1, dev:sdc5 disk 3, o:1, dev:sdd5 RAID5 conf printout: --- rd:4 wd:3 disk 0, o:1, dev:sda5 disk 2, o:1, dev:sdc5 disk 3, o:1, dev:sdd5 Buffer I/O error on device sdb2, logical block 98350 lost page write due to I/O error on sdb2 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:09:5b:25:14:ee:08:00 SRC=212.13.194.96 DST=192.168.1.11 LEN=76 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=123 DPT=123 LEN=56 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:09:5b:25:14:ee:08:00 SRC=212.13.194.96 DST=192.168.1.11 LEN=76 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=123 DPT=123 LEN=56 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:09:5b:25:14:ee:08:00 SRC=212.13.194.96 DST=192.168.1.11 LEN=76 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=123 DPT=123 LEN=56 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:09:5b:25:14:ee:08:00 SRC=212.13.194.96 DST=192.168.1.11 LEN=76 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=123 DPT=123 LEN=56 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:09:5b:25:14:ee:08:00 SRC=212.13.194.96 DST=192.168.1.11 LEN=76 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=123 DPT=123 LEN=56 SFW2-INext-ACC-TCP IN=eth0 OUT= MAC=00:17:31:4c:c2:28:00:40:ca:3b:a6:05:08:00 SRC=192.168.1.12 DST=192.168.1.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=11827 DF PROTO=TCP SPT=27999 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A004B8D400000000001030306) Buffer I/O error on device sdb5, logical block 488761344 Buffer I/O error on device sdb5, logical block 488761345 Buffer I/O error on device sdb5, logical block 488761346 Buffer I/O error on device sdb5, logical block 488761347 Buffer I/O error on device sdb5, logical block 488761348 Buffer I/O error on device sdb5, logical block 488761349 Buffer I/O error on device sdb5, logical block 488761350 Buffer I/O error on device sdb5, logical block 488761351 Buffer I/O error on device sdb5, logical block 488761344 Buffer I/O error on device sdb5, logical block 488761345 nas:~ # ll /var ls: cannot access /var/adm: Input/output error ls: cannot access /var/X11R6: Input/output error total 52 d????????? ? ? ? ? ? adm drwxr-xr-x 8 root root 4096 2007-11-21 23:42 cache drwxrwxr-x 3 games games 4096 2007-10-28 23:23 games drwxr-xr-x 20 root root 4096 2007-11-01 23:09 lib drwxrwxr-t 5 root uucp 4096 2008-06-20 10:03 lock drwxr-xr-x 8 root root 4096 2008-06-20 10:02 log drwx------ 2 root root 16384 2007-10-28 23:09 lost+found lrwxrwxrwx 1 root root 10 2007-10-28 23:09 mail -> spool/mail drwxr-xr-x 2 root root 4096 2007-09-21 23:04 opt drwxr-xr-x 10 root root 4096 2008-06-20 10:03 run drwxr-xr-x 9 root root 4096 2007-11-01 23:09 spool drwxrwxrwt 4 root root 4096 2008-06-20 00:03 tmp d????????? ? ? ? ? ? X11R6 nas:~ #
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature