Re: Adding a new disk after disk failure on raid6 volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/12/2011 09:21, Robin Hill wrote:
On Tue Dec 20, 2011 at 09:46:13AM +0100, BERTRAND Joël wrote:

Hello,

I use several softraid volumes for a very long time. Last week, a disk
has crashed on a raid6 volume and I have tried to replace faulty disk.
Today, when Linux boots, it only assembles this volume if the new disk
is marked as 'faulty' or 'removed', and I don't understand...

System is a sparc64-smp server running linux debian/testing :

Root rayleigh:[~]>  uname -a
Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64
GNU/Linux
Root rayleigh:[~]>  dpkg-query -l | grep mdadm
ii  mdadm                                 3.2.2-1

Faulty device is /dev/sde1 :

Root rayleigh:[~]>  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
        359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]

All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and
each disk only contains one partition (type FD, linux autodetect). If I
add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added
and my raid6 runs with all disks. But I obtain the same superblock on
/dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one
disappears also (i think that both superblocks are the same).

<- SNIP info ->

All disks return same information except /dev/sde when it is running
(mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same
information). What is my mistake ? Is this a known issue ?

It's a known issue with 0.9 superblocks, yes. There's no information in
the superblock with allows md to tell whether it's on the partition or
the disk, so for full-disk partitions the same superblock could be valid
for both. 0.1 superblocks contain extra information which can be used to
differentiate between these. I'm a little surprised that the other
drives don't get detected in the same way though.

I think the above issue only occurs on partitions with particular alignments, iirc starting at multiples of 8 sectors. Old fdisk would always create the first partition starting at sector 63, and that was the case with the output we saw for /dev/sdc, but a new fdisk will likely create the partition starting at sector 2048.

Alternatively, or additionally, the problem may be that very old fdisk had a bug where it miscounted and didn't create partitions right up to the last "cylinder" of the disc, so the md metadata on the last partition wasn't in the the same place as it would have been if it was for the whole disc.

Either way, I would recommend that the OP --fail, --remove and --zero-superblock his /dev/sde1, then copy a working partition table from sdc with `dd if=/dev/sdc of=/dev/sde bs=512 count=1`, then `blockdev --rereadpt /dev/sde`, then `fdisk -lu /dev/sde` just to make sure that there is now an sde1 that's identical to sdc1, then --add the new /dev/sde1.

Hope this helps!

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux