Re: Need urgent help in fixing raid5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Figured this out.  I had to stop md1 even though md couldn't assemble it.  The "good devices" were still running.

Will let you know how it goes.

thx
mike


----- Original Message ----
From: Mike Myers <mikesm559@xxxxxxxxx>
To: Neil Brown <neilb@xxxxxxx>
Cc: Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx; john lists <john4lists@xxxxxxxxx>
Sent: Monday, January 12, 2009 9:38:21 PM
Subject: Re: Need urgent help in fixing raid5 array

Ok, still more help needed.  I finally got enough time scheduled tonight to be able to try recreating the raid array as per our conversation before.  When doing the create as you outlined in your earlier post, mdadm -C says the first two disks are part of an existing raid array (I assume this is a normal "error" for this sort of situation and will be ignored in the end), but for each of the last 4 devices I speciffy on the command line, it says: mdadm: cannot open /dev/sdc1: Device or resource busy (and gives this error for each of the 4 devices).

The devices are online though.  I can do an mdadm --examine on them, dd from them, and do smartctl operations on them.  Why would md think they were busy?

Thx
Mike






----- Original Message ----
From: Neil Brown <neilb@xxxxxxx>
To: Mike Myers <mikesm559@xxxxxxxxx>
Cc: Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx; john lists <john4lists@xxxxxxxxx>
Sent: Tuesday, January 6, 2009 3:31:44 PM
Subject: Re: Need urgent help in fixing raid5 array

On Monday January 5, mikesm559@xxxxxxxxx wrote:
> BTW, in the original email I sent that had the --examine info for
> each of these array members, three devices have the same device UUID
> and array slot, and two of them share an older event count, and one
> has a slightly newer event count.  Which of these should be the real
> array slot 0?  And I notice that one of the members in that email
> had a device UUID that I can't find anymore (I suspect it's the
> current sdf1 that thinks it's part of md2).  In that email, it had
> array slot 4, which is one of the missing devices in the current
> familt (that I assume --assemble would add as "3").  It also has
> 9663 hours on it, which makes it part of the original set of 4
> members for this raid5 array.  The drive in slot 5 only has 7630
> hours on it, so it should have been added later as part of a --grow
> operation. 
> 
> Does all that make sense?  If so, then sdb1, (which says it's slot
> 0), sdi1 (at 9671 hours) and also thinks it's slot 0, sdj1 (at 9194
> hours) which also says it's 0, and sdf1 (at 9663 hours) and used to
> apparently think it's slot 4 should be the original 4 drives of the
> array.  How can I figure out which is the real slot 0, and who is
> slot 1 and 2 if sdi1 and sdj1 all have the same event count and
> array slot id (0) and same device UUID? 

I had noticed the slot number was repeated.  I hadn't noticed the
device uuid was the same, though I guess that makes sense.  Somehow
the superblock for one device has been written to the other devices.
It is not really possible to be sure which is the original without
knowing how this happened, though I suspect that the one with the
higher event count is more likely to be the original.

Being a software guy, I tend to like to blame hardware, and I wonder
if your problematic backplane managed to send write requests to the
wrong drive somehow.  If it did, then my expectation of your success
just went down a few notches. :-(

The only option for you to try to find out which device is which is to
try various combinations and see what gives you access to the most
consistent data.

> 
> This is way harder work than should be need to fix a problem.  :-)
> But I am sure glad you gurus know how this stuff is supposed to
> work! 

I'm happy to help as much as I can... I just hope your hardware hasn't
done too much damage...

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux