Quoting Neil Brown <neilb@xxxxxxx>:
On Tuesday November 7, lists@xxxxxxxxx wrote:
Booting into a live CD, mdadm -E /dev/sdaX shows that the checksum is
not what would be expected for sda1,2,3 but is fine for sda6. All of
the checksums on drive sdb are correct.
I'm surprised it doesn't boot then. How are the arrays being
assembled? A more complete kernel log would help.
Neil,
Thanks for such a quick reply. I will post the kernel logs if the
below is not enough information. The old dmesg should also still be
on the partition.
The state is "clean" for all partitions, working 2, active 2 and
failed 0. The table for sdb1,2,3 shows that the first device has been
removed and is no longer an active mirror.
What is the best way to proceed here? Can I somehow sync from the
second disk, which appears to have the correct checksums? Is there an
easy way to fix this that wont involve loosing the data?
While booted from the live CD you should be able to
mdadm -AR /dev/md0 /dev/sdb1
mdadm /dev/md0 --add /dev/sda1
Fantastic, this works well for two of the partitions. However the
third has a bad sector (as reported by smartmontools) on the disk with
the "good" superblock. The disk cannot read the sector, so the
syncing fails and starts over at 15.7% each time.
Is it safe to mount that partition outside of the md, find the file,
remove it so that the disk can remap that sector (it is shown as
Currently_Pending in SMART right now) then resync the array? I guess
this will cause problems and break the mirror. Or is the correct way
to remove the "bad" superblock drive from the array, mount the md,
remove the file then resync the array?
If it is possible to do either of the above, how do I stop the
recovery? It now starts automatically at live CD boot, repeating from
15.7% over and over. My knowledge of the tools is bad but I tried the
following:
# mdadm /dev/md0 --remove /dev/sda1
and
# mdadm -f /dev/md0 --remove /dev/sda1 (no idea if the -f even makes
sense there)
It is very odd that the checksums are all wrong though. Kernel
version? mdadm version? hardware architecture?
Kernel installed from Ubuntu 6.06 sources, 2.6.15. Machine is a x86
Dell with two identical Maxtor DiamondMax drives on an Intel 82801
SATA controller.
mdadm is version 1.12. Looking at the most recently available version
this seems incredibly out of date, but seems to be the default
installed in Ubuntu. Even Debian stable seems to have 1.9. I can bug
this with them for an update if necessary.
Is it possible that a broken init script has tried to fsck an
individual drive instead of the md? /etc/fstab only uses /dev/md*
references but I'll check other scripts when (if? :) I get the system
back up and running.
Whilst the machine is not critical and is only a new install, I'd like
to keep fighting rather than give in if possible.
Thanks,
David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html