Hi folks, the dd_rescue to the new HDD took 14hours. It looks like ddrescue is not reading and writing in parallel. In the end 8kb couldn't be read after 10 retries. I just force-assembled the RAID with the new drive, but it failed almost immediately with an WRITE FPDMA QUEUED error on one of the other drives (sdj, formerly sdi). I tried immediately again, an this time one disk was rejected but the RAID started on 8 devices, but xfs_repair failed when one of the disks failed with an READ FPDMA QUEUED error :( and md expelled the disk from the RAID. It looks more like a controller problem as all the messages comming from the drives on the PCIe Marvell have all the line ataXX: illegal qc_active transition (00000002->00000003) I found only one similar report about that problem: http://marc.info/?l=linux-ide&m=131475722021117 Any recommendations for a decent and affordable SATA Controller with at least 4 ports and faster than PCIe x1? Looks like there are only Marvells and more expensive Enterprise RAID controllers. Currently the RAID is running clean, but degraded. The filesystem is mounted ro and looks healthy. I attached a mdadm --detail and put the kernel logs since yesterday at http://evilazrael.net/bilder2/logs/kernel_20130203.log and http://evilazrael.net/bilder2/logs/kernel_20130203.log.gz I think my action plan is: - Get reliable controller ASAP - Re-add the missing disk - Upgrade to RAID 6 - Schedule regularly scrubbing Thanks for all the help so far, i think i can see the light at the end of the tunnel :) Am 03.02.2013 02:22, schrieb Phil Turmel: >> How do the serial numbers help? > > It is vital to keep track of raid device number (logical position in the > array) versus drive serial numbers, as device names are not guaranteed > to be consistent between boots (and certainly not when mucking around > with cables and connectors). > I am aware of that problem then plugging drives around or adding new ones during runtime. > When you are done with dd_rescue, make sure of the mapping again. > lsdrv[1] gives you both pieces of information in one utility, you might > find it easier than mapping by hand. The owner's name sounds familar ;) Will send you a mail later. Kind regards Christoph Nelles -- Christoph Nelles E-Mail : evilazrael@xxxxxxxxxxxxx Jabber : eazrael@xxxxxxxxxxxxxx ICQ : 78819723 PGP-Key : ID 0x424FB55B on subkeys.pgp.net or http://evilazrael.net/pgp.txt
/dev/md0: Version : 1.2 Creation Time : Fri Apr 27 20:25:04 2012 Raid Level : raid5 Array Size : 23442114560 (22356.14 GiB 24004.73 GB) Used Dev Size : 2930264320 (2794.52 GiB 3000.59 GB) Raid Devices : 9 Total Devices : 8 Persistence : Superblock is persistent Update Time : Sun Feb 3 16:30:02 2013 State : clean, degraded Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : router:0 (local to host router) UUID : 6b21b3ed:d39d5a54:d4939113:77851cb6 Events : 27770 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 0 0 1 removed 2 8 129 2 active sync /dev/sdi1 3 8 49 3 active sync /dev/sdd1 4 8 145 4 active sync /dev/sdj1 5 8 97 5 active sync /dev/sdg1 6 8 17 6 active sync /dev/sdb1 7 8 81 7 active sync /dev/sdf1 8 8 65 8 active sync /dev/sde1