Re: 4 out of 16 drives show up as 'removed'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7 December 2011 20:42, Eli Morris <ermorris@xxxxxxxx> wrote:
> Hi All,
>
>  I thought maybe someone could help me out. I have a 16 disk software RAID that we use for backup. This is at least the second time this happened- all at once, four of the drives report as 'removed' when none of them actually were. These drives also disappeared from the 'lsscsi' list until I restarted the disk expansion chassis where they live.
>
> These are the dreaded Caviar Green drives. We bought 16 of them as an upgrade for a hardware RAID originally, because the tech from that company said they would work fine. After running them for a while, four drives dropped out of that array. So I put them in the software RAID expansion chassis they are in now, thinking I might have better luck. In this configuration, this happened once before. That time, the drives looked to all have significant numbers of bad sectors, so I got those ones replaced and thought that that might have been the problem all along. Now it has happened again. So I have two fairly predictable questions and I'm hoping someone might be able to offer a suggestion:
>
> 1) Any ideas on how to get this array working again without starting from scratch? It's all backup data, so it's not do or die, but it is also 30 TB and I really don't want to rebuild the whole thing again from scratch.
>
> I tried the re-add command and the error was something like 'not allowed'
>
> 2) Any idea on how to stop this from happening again? I was thinking of playing with the disk timeout in the OS (not the one on the drive firmware).
>
> If anyway can help, I'd greatly appreciate it, because, at this point, I have no idea what to do about this mess.
>
> Thanks!
>
> Eli
>
>
> [root@stratus ~]# mdadm --detail /dev/md5
> /dev/md5:
>        Version : 1.2
>  Creation Time : Wed Oct 12 16:32:41 2011
>     Raid Level : raid5
>  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
>   Raid Devices : 16
>  Total Devices : 13
>    Persistence : Superblock is persistent
>
>    Update Time : Mon Dec  5 12:52:46 2011
>          State : active, FAILED, Not Started
>  Active Devices : 12
> Working Devices : 13
>  Failed Devices : 0
>  Spare Devices : 1
>
>         Layout : left-symmetric
>     Chunk Size : 512K
>
>           Name : stratus.pmc.ucsc.edu:5  (local to host stratus.pmc.ucsc.edu)
>           UUID : 3189ca06:ccf973d0:7ef41366:98a75a32
>         Events : 32
>
>    Number   Major   Minor   RaidDevice State
>       0       8        1        0      active sync   /dev/sda1
>       1       0        0        1      removed
>       2       8       33        2      active sync   /dev/sdc1
>       3       8       49        3      active sync   /dev/sdd1
>       4       8       65        4      active sync   /dev/sde1
>       5       8       81        5      active sync   /dev/sdf1
>       6       8       97        6      active sync   /dev/sdg1
>       7       8      113        7      active sync   /dev/sdh1
>       8       0        0        8      removed
>       9       8      145        9      active sync   /dev/sdj1
>      10       8      161       10      active sync   /dev/sdk1
>      11       8      177       11      active sync   /dev/sdl1
>      12       8      193       12      active sync   /dev/sdm1
>      13       8      209       13      active sync   /dev/sdn1
>      14       0        0       14      removed
>      15       0        0       15      removed
>
>      16       8      225        -      spare   /dev/sdo1
> [root@stratus ~]#
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Hi,

To eliminate bad disks, can you post the smartctl -a output of all the
removed drives? (if you can get the OS to see them again)

Also, do you have any log files from when this happened? (kernel log,
dmesg, syslog etc)

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux