Re: Help RAID5 reshape Oops / backup-file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Message from neilb@xxxxxxx ---------
    Date: Mon, 15 Oct 2007 09:31:23 +1000
    From: Neil Brown <neilb@xxxxxxx>
Reply-To: Neil Brown <neilb@xxxxxxx>
 Subject: Re: Help RAID5 reshape Oops / backup-file
      To: Nagilum <nagilum@xxxxxxxxxxx>
      Cc: linux-raid@xxxxxxxxxxxxxxx


On Sunday October 14, nagilum@xxxxxxxxxxx wrote:
Can someone tell me if I'm on the right track?
I've now noticed the following:
# ~/mdadm-2.6.3/mdadm -v -A /dev/md0 /dev/sd[d-e]
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdd is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sde is identified as a member of /dev/md0, slot -1.
mdadm: No suitable drives found for /dev/md0

Hmm... that might be useful..

I just found your earlier email where you said:

After the machine came back up (on a rescue disk) I thought I'd
simply have to go through the process again. So I use add add the
new disk again.
Although that worked, I am now unable to resume the growing
process.

Using "add add" again was not correct, and should not have been
possible.
You should have simply assembled the array with the full new set of
devices.  Then reshape would have automatically restarted properly.

Can you remember *exactly* what you did?  If I can reproduce the
situation, I can find the best way to fix it and send you something to
try.

NeilBrown


----- End message from neilb@xxxxxxx -----

Sure, here it goes:
The system is running Debian Etch ia64, kernel 2.6.18,
(since the exact versions might be important in this case I made copies of what I deemed to be relevant available online) a copy of the "linux/drivers/md" folder of that particular kernel can be found at:
  http://www.nagilum.de/md/md

Etch comes with mdadm-2.5.6 + Debian patches.
See http://www.nagilum.de/md/mdadm-2.5.6/debian/changelog
I made the whole Debian Package available here:
 http://www.nagilum.de/md/
 - "mdadm-2.5.6" the extracted source with Debian patches applied
 -  mdadm_2.5.6-9.diff.gz the diff to mdadm_2.5.6.orig.tar.gz
- mdadm_2.5.6-9_i386.deb the i385 version of the package, however I was/am using mdadm_2.5.6-9_ia64.deb
 - "mdadm_2.5.6-9.dsc" description file for building the .deb

The Raid was being reshaped from three to five drives when the shutdown was issued. I assume the shutdown went normally since the machine was off and there was no power interruption.
Upon booting the system it became apparent that the RAID was non functional.
The system boots off of a USB stick and then mounts its root filesystem from the RAID. Assembling the RAID happens within the initrd. The relevant scripts can be found here: http://www.nagilum.de/md/local-top/
I booted a rescue disk which is based on the identical Linux version.
I looked at the "mdadm -Q --detail /dev/md0" output and saw only 3 of the 5 disks in the RAID. Then I did (what I should not have done) the add of the two new disks, assuming that mdadm will touch these in a harmful way (without using --force) and refuse to do so if that's not the way to add active disk.
The disks were added but the reshape did not continue.
Up until now I can't think of anything else I did that could have changed something. (and "mdadm -Q --detail /dev/md0" looks the same ever since) I think, what I should have done instead of adding those disks would have been to either use --re-add and/or update /etc/mdadm/mdadm.conf. But then again I never expected this to become so problematic. :( By now I can also boot with 2.6.23 (I'll update to 2.6.23.1 shortly) and I have the latest mdadm tools (in parallel to the old ones). I also build the test_stripe utility and tried a very briefly the "test" argument, but it wanted me to specify an existing file so I chickened out. ;)
Thanks a lot for looking into this!
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@xxxxxxxxxxx \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

Attachment: pgpbLme224q9Z.pgp
Description: PGP Digital Signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux