Hello, thanks for the hint. I do a backup with dd before that, I hope I can get back the data of the raid. The following is in the syslog: Jul 8 19:21:15 p3 kernel: Buffer I/O error on device dm-1, logical block 365625856 Jul 8 19:21:15 p3 kernel: Buffer I/O error on device dm-1, logical block 365625856 Jul 8 19:21:15 p3 kernel: lost page write due to I/O error on dm-1 Jul 8 19:21:15 p3 kernel: lost page write due to I/O error on dm-1 Jul 8 19:21:15 p3 kernel: JBD: I/O error detected when updating journal superblock for dm-1. Jul 8 19:21:15 p3 kernel: JBD: I/O error detected when updating journal superblock for dm-1. Jul 8 19:21:15 p3 kernel: RAID conf printout: Jul 8 19:21:15 p3 kernel: RAID conf printout: Jul 8 19:21:15 p3 kernel: --- level:5 rd:4 wd:2 Jul 8 19:21:15 p3 kernel: --- level:5 rd:4 wd:2 Jul 8 19:21:15 p3 kernel: disk 0, o:1, dev:sdf1 Jul 8 19:21:15 p3 kernel: disk 0, o:1, dev:sdf1 Jul 8 19:21:15 p3 kernel: disk 1, o:1, dev:sde1 Jul 8 19:21:15 p3 kernel: disk 1, o:1, dev:sde1 Jul 8 19:21:15 p3 kernel: disk 2, o:1, dev:sdc1 Jul 8 19:21:15 p3 kernel: disk 2, o:1, dev:sdc1 Jul 8 19:21:15 p3 kernel: disk 3, o:0, dev:sdd1 Jul 8 19:21:15 p3 kernel: disk 3, o:0, dev:sdd1 Jul 8 19:21:15 p3 kernel: RAID conf printout: Jul 8 19:21:15 p3 kernel: RAID conf printout: Jul 8 19:21:15 p3 kernel: --- level:5 rd:4 wd:2 Jul 8 19:21:15 p3 kernel: --- level:5 rd:4 wd:2 Jul 8 19:21:15 p3 kernel: disk 0, o:1, dev:sdf1 Jul 8 19:21:15 p3 kernel: disk 0, o:1, dev:sdf1 Jul 8 19:21:15 p3 kernel: disk 1, o:1, dev:sde1 Jul 8 19:21:15 p3 kernel: disk 1, o:1, dev:sde1 Jul 8 19:21:15 p3 kernel: disk 2, o:1, dev:sdc1 Jul 8 19:21:15 p3 kernel: disk 2, o:1, dev:sdc1 Jul 8 19:21:15 p3 kernel: md: recovery of RAID array md0 Jul 8 19:21:15 p3 kernel: md: recovery of RAID array md0 Jul 8 19:21:15 p3 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Jul 8 19:21:15 p3 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Jul 8 19:21:15 p3 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Jul 8 19:21:15 p3 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Jul 8 19:21:15 p3 kernel: md: using 128k window, over a total of 1465126400k. Jul 8 19:21:15 p3 kernel: md: using 128k window, over a total of 1465126400k. Jul 8 19:21:15 p3 kernel: md: resuming recovery of md0 from checkpoint. Jul 8 19:21:15 p3 kernel: md: resuming recovery of md0 from checkpoint. I think the right order is sdf1 sde1 sdc1 sdd1, I am right? So I have to do: mdadm -C /dev/md1 -l5 -n4 -e 1.2 -c 512 /dev/sdf1 /dev/sde1 missing /dev/sdd1 The question is: sould I also add --assume-clean Thanks! Dietrich Am 09.07.2012 02:12 schrieb "NeilBrown" <neilb@xxxxxxx>: > > On Sun, 8 Jul 2012 21:05:02 +0200 Dietrich Heise <dh@xxxxxxx> wrote: > > > Hi, > > > > the following Problem, > > One of four drives has S.M.A.R.T. errors, so I removed it and > > replaced, with a new one. > > > > In the time the drive was rebuilding, one of the three left devices > > has an I/O error (sdd1) (sdc1 was the replaced drive an was syncing). > > > > Now the following happends (two drives are spare drives :( ) > > It looks like you tried to --add /dev/sdd1 back in after it failed, and mdadm > let new. Newer versions of mdadm will refuse as that is not a good thing to > do but it shouldn't stop you getting your data back. > > First thing to realise is that you could have data corruption. There is at > least one block in the array which cannot be recovered, possibly more. i.e. > any block on sdd1 which is bad, and any block at the same offset in sdc1. > These blocks may not be in files which would be lucky, or they may contain > important metadata which might mean you've lost lots of files. > > If you hadn't tried to --add /dev/sdd1 you could just force-assemble the > array back to degraded mode (without sdc1) and back up any critical data. > As sdd1 now thinks it is a spare you need to re-create the array instead: > > mdadm -S /dev/md1 > mdadm -C /dev/md1 -l5 -n4 -e 1.2 -c 512 /dev/sdf1 /dev/sde1 /dev/sdd1 missing > or > mdadm -C /dev/md1 -l5 -n4 -e 1.2 -c 512 /dev/sdf1 /dev/sde1 missing /dev/sdd1 > > depending on whether sdd1 as the 3rd or 4th device in the array - I cannot > tell from the output here. > > You should then be able to mount the array and backup stuff. > > You then want to use 'ddrescue' to copy sdd1 onto a device with no bad > blocks, and assemble the array using the device instead of sdd1. > > Finally, you can add the new spare (sdc1) to the array and it should rebuild > successfully - providing there are no bad blocks on sdf1 or sde1. > > I hope that makes sense. Do ask if anything is unclear. > > NeilBrown > > > > > > p3 disks # mdadm -D /dev/md1 > > /dev/md1: > > Version : 1.2 > > Creation Time : Mon Feb 28 19:57:56 2011 > > Raid Level : raid5 > > Used Dev Size : 1465126400 (1397.25 GiB 1500.29 GB) > > Raid Devices : 4 > > Total Devices : 4 > > Persistence : Superblock is persistent > > > > Update Time : Sun Jul 8 20:37:12 2012 > > State : active, FAILED, Not Started > > Active Devices : 2 > > Working Devices : 4 > > Failed Devices : 0 > > Spare Devices : 2 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Name : p3:0 (local to host p3) > > UUID : 6d4ebfd4:491bcb50:d98d5e67:f226f362 > > Events : 121205 > > > > Number Major Minor RaidDevice State > > 0 8 81 0 active sync /dev/sdf1 > > 1 8 65 1 active sync /dev/sde1 > > 2 0 0 2 removed > > 3 0 0 3 removed > > > > 4 8 49 - spare /dev/sdd1 > > 5 8 33 - spare /dev/sdc1 > > > > here is more information: > > > > p3 disks # mdadm -E /dev/sdc1 > > /dev/sdc1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : 6d4ebfd4:491bcb50:d98d5e67:f226f362 > > Name : p3:0 (local to host p3) > > Creation Time : Mon Feb 28 19:57:56 2011 > > Raid Level : raid5 > > Raid Devices : 4 > > > > Avail Dev Size : 2930275057 (1397.26 GiB 1500.30 GB) > > Array Size : 8790758400 (4191.76 GiB 4500.87 GB) > > Used Dev Size : 2930252800 (1397.25 GiB 1500.29 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : caefb029:526187ef:2051f578:db2b82b7 > > > > Update Time : Sun Jul 8 20:37:12 2012 > > Checksum : 18e2bfe1 - correct > > Events : 121205 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : spare > > Array State : AA.. ('A' == active, '.' == missing) > > p3 disks # mdadm -E /dev/sdd1 > > /dev/sdd1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : 6d4ebfd4:491bcb50:d98d5e67:f226f362 > > Name : p3:0 (local to host p3) > > Creation Time : Mon Feb 28 19:57:56 2011 > > Raid Level : raid5 > > Raid Devices : 4 > > > > Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB) > > Array Size : 8790758400 (4191.76 GiB 4500.87 GB) > > Used Dev Size : 2930252800 (1397.25 GiB 1500.29 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : 4231e244:60e27ed4:eff405d0:2e615493 > > > > Update Time : Sun Jul 8 20:37:12 2012 > > Checksum : 4bec6e25 - correct > > Events : 0 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : spare > > Array State : AA.. ('A' == active, '.' == missing) > > p3 disks # mdadm -E /dev/sde1 > > /dev/sde1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : 6d4ebfd4:491bcb50:d98d5e67:f226f362 > > Name : p3:0 (local to host p3) > > Creation Time : Mon Feb 28 19:57:56 2011 > > Raid Level : raid5 > > Raid Devices : 4 > > > > Avail Dev Size : 2930253889 (1397.25 GiB 1500.29 GB) > > Array Size : 8790758400 (4191.76 GiB 4500.87 GB) > > Used Dev Size : 2930252800 (1397.25 GiB 1500.29 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : 28b08f44:4cc24663:84d39337:94c35d67 > > > > Update Time : Sun Jul 8 20:37:12 2012 > > Checksum : 15faa8a1 - correct > > Events : 121205 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 1 > > Array State : AA.. ('A' == active, '.' == missing) > > p3 disks # mdadm -E /dev/sdf1 > > /dev/sdf1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : 6d4ebfd4:491bcb50:d98d5e67:f226f362 > > Name : p3:0 (local to host p3) > > Creation Time : Mon Feb 28 19:57:56 2011 > > Raid Level : raid5 > > Raid Devices : 4 > > > > Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB) > > Array Size : 8790758400 (4191.76 GiB 4500.87 GB) > > Used Dev Size : 2930252800 (1397.25 GiB 1500.29 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : active > > Device UUID : 78d5600a:91927758:f78a1cea:3bfa3f5b > > > > Update Time : Sun Jul 8 20:37:12 2012 > > Checksum : 7767cb10 - correct > > Events : 121205 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 0 > > Array State : AA.. ('A' == active, '.' == missing) > > > > Is there a way to repair the raid? > > > > thanks! > > Dietrich > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html