Re: Checksums wrong on one disk of mirror

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Neil Brown <neilb@xxxxxxx>:
On Tuesday November 7, lists@xxxxxxxxx wrote:
Booting into a live CD, mdadm -E /dev/sdaX shows that the checksum is
not what would be expected for sda1,2,3 but is fine for sda6. All of
the checksums on drive sdb are correct.

I'm surprised it doesn't boot then.  How are the arrays being
assembled? A more complete kernel log would help.

Neil,

Thanks for such a quick reply. I will post the kernel logs if the below is not enough information. The old dmesg should also still be on the partition.

The state is "clean" for all partitions, working 2, active 2 and
failed 0. The table for sdb1,2,3 shows that the first device has been
removed and is no longer an active mirror.

What is the best way to proceed here? Can I somehow sync from the
second disk, which appears to have the correct checksums? Is there an
easy way to fix this that wont involve loosing the data?

While booted from the live CD you should be able to
  mdadm -AR /dev/md0 /dev/sdb1
  mdadm /dev/md0 --add /dev/sda1

Fantastic, this works well for two of the partitions. However the third has a bad sector (as reported by smartmontools) on the disk with the "good" superblock. The disk cannot read the sector, so the syncing fails and starts over at 15.7% each time.

Is it safe to mount that partition outside of the md, find the file, remove it so that the disk can remap that sector (it is shown as Currently_Pending in SMART right now) then resync the array? I guess this will cause problems and break the mirror. Or is the correct way to remove the "bad" superblock drive from the array, mount the md, remove the file then resync the array?

If it is possible to do either of the above, how do I stop the recovery? It now starts automatically at live CD boot, repeating from 15.7% over and over. My knowledge of the tools is bad but I tried the following:

# mdadm /dev/md0 --remove /dev/sda1
and
# mdadm -f /dev/md0 --remove /dev/sda1 (no idea if the -f even makes sense there)

It is very odd that the checksums are all wrong though.  Kernel
version? mdadm version? hardware architecture?

Kernel installed from Ubuntu 6.06 sources, 2.6.15. Machine is a x86 Dell with two identical Maxtor DiamondMax drives on an Intel 82801 SATA controller.

mdadm is version 1.12. Looking at the most recently available version this seems incredibly out of date, but seems to be the default installed in Ubuntu. Even Debian stable seems to have 1.9. I can bug this with them for an update if necessary.

Is it possible that a broken init script has tried to fsck an individual drive instead of the md? /etc/fstab only uses /dev/md* references but I'll check other scripts when (if? :) I get the system back up and running.

Whilst the machine is not critical and is only a new install, I'd like to keep fighting rather than give in if possible.

Thanks,

David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux