Re: RAID5 with 2 drive failure at the same time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

the dd_rescue to the new HDD took 14hours. It looks like ddrescue is not
reading and writing in parallel. In the end 8kb couldn't be read after
10 retries.

I just force-assembled the RAID with the new drive, but it failed almost
immediately with an WRITE FPDMA QUEUED error on one of the other drives
(sdj, formerly sdi). I tried immediately again, an this time one disk
was rejected but the RAID started on 8 devices, but xfs_repair failed
when one of the disks failed with an READ FPDMA QUEUED error :( and md
expelled the disk from the RAID.



It looks more like a controller problem as all the messages comming from
the drives on the PCIe Marvell have all the line
ataXX: illegal qc_active transition (00000002->00000003)
I found only one similar report about that problem:
http://marc.info/?l=linux-ide&m=131475722021117

Any recommendations for a decent and affordable SATA Controller with at
least 4 ports and faster than PCIe x1? Looks like there are only
Marvells and more expensive Enterprise RAID controllers.



Currently the RAID is running clean, but degraded. The filesystem is
mounted ro and looks healthy. I attached a mdadm --detail and put the
kernel logs since yesterday at
http://evilazrael.net/bilder2/logs/kernel_20130203.log and
http://evilazrael.net/bilder2/logs/kernel_20130203.log.gz

I think my action plan is:
- Get reliable controller ASAP
- Re-add the missing disk
- Upgrade to RAID 6
- Schedule regularly scrubbing

Thanks for all the help so far, i think i can see the light at the end
of the tunnel :)


Am 03.02.2013 02:22, schrieb Phil Turmel:
>> How do the serial numbers help?
> 
> It is vital to keep track of raid device number (logical position in the
> array) versus drive serial numbers, as device names are not guaranteed
> to be consistent between boots (and certainly not when mucking around
> with cables and connectors).
> 

I am aware of that problem then plugging drives around or adding new
ones during runtime.

> When you are done with dd_rescue, make sure of the mapping again.
> lsdrv[1] gives you both pieces of information in one utility, you might
> find it easier than mapping by hand.

The owner's name sounds familar ;) Will send you a mail later.



Kind regards

Christoph Nelles


-- 
Christoph Nelles

E-Mail    : evilazrael@xxxxxxxxxxxxx
Jabber    : eazrael@xxxxxxxxxxxxxx      ICQ       : 78819723

PGP-Key   : ID 0x424FB55B on subkeys.pgp.net
            or http://evilazrael.net/pgp.txt

/dev/md0:
        Version : 1.2
  Creation Time : Fri Apr 27 20:25:04 2012
     Raid Level : raid5
     Array Size : 23442114560 (22356.14 GiB 24004.73 GB)
  Used Dev Size : 2930264320 (2794.52 GiB 3000.59 GB)
   Raid Devices : 9
  Total Devices : 8
    Persistence : Superblock is persistent

    Update Time : Sun Feb  3 16:30:02 2013
          State : clean, degraded
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : router:0  (local to host router)
           UUID : 6b21b3ed:d39d5a54:d4939113:77851cb6
         Events : 27770

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed
       2       8      129        2      active sync   /dev/sdi1
       3       8       49        3      active sync   /dev/sdd1
       4       8      145        4      active sync   /dev/sdj1
       5       8       97        5      active sync   /dev/sdg1
       6       8       17        6      active sync   /dev/sdb1
       7       8       81        7      active sync   /dev/sdf1
       8       8       65        8      active sync   /dev/sde1

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux