Re: What the heck happened to my array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/04/11 14:10, NeilBrown wrote:
>> - Reboot required to get system back.
>> - Restarted reshape with 9 drives.
>> - sdl suffered IO error and was kicked
>
> Very sad.

I'd say pretty damn unlucky actually.

>> - Array froze all IO.
>
> Same thing...
>
>> - Reboot required to get system back.
>> - Array will no longer mount with 8/10 drives.
>> - Mdadm 3.1.5 segfaults when trying to start reshape.
>
> Don't know why it would have done that... I cannot reproduce it easily.

No. I tried numerous incantations. The system version of mdadm is Debian 3.1.4. This segfaulted so I downloaded and compiled 3.1.5 which did the same thing. I then composed most of this E-mail, made *really* sure my backups were up to date and tried 3.2.1 which to my astonishment worked. It's been ticking along _slowly_ ever since.

>>     Naively tried to run it under gdb to get a backtrace but was unable
>> to stop it forking
>
> Yes, tricky .... an "strace -o /tmp/file -f mdadm ...." might have been
> enough, but to late to worry about that now.

I wondered about using strace but for some reason got it into my head that a gdb backtrace would be more useful. Then of course I got it started with 3.2.1 and have not tried again.

>> - Got array started with mdadm 3.2.1
>> - Attempted to re-add sdd/sdl (now marked as spares)
>
> Hmm... it isn't meant to do that any more. I thought I fixed it so that it > if a device looked like part of the array it wouldn't add it as a spare...
> Obviously that didn't work.  I'd better look in to it again.

Now the chain of events that led up to this was along these lines.
- Rebooted machine.
- Tried to --assemble with 3.1.4
- mdadm told me it did not really want to continue with 8/10 devices and I should use --force if I really wanted it to try.
- I used --force
- I did a mdadm --add /dev/md0 /dev/sdd and the same for sdl
- I checked and they were listed as spares.

So this was all done with Debian's mdadm 3.1.4, *not* 3.1.5

>
> No, you cannot give it extra redundancy.
> I would suggest:
>    copy anything that you need off, just in case - if you can.
>
> Kill the mdadm that is running in the back ground. This will mean that > if the machine crashes your array will be corrupted, but you are thinking
>    of rebuilding it any, so that isn't the end of the world.
>    In /sys/block/md0/md
>       cat suspend_hi>  suspend_lo
>       cat component_size>  sync_max
>
> That will allow the reshape to continue without any backup. It will be
>    much faster (but less safe, as I said).

Well, I have nothing to lose, but I've just picked up some extra drives so I'll make second backups and then give this a whirl.

> If something goes wrong, you will need to scrap the array, recreate it, and
>    copy data back from where-ever you copied it to (or backups).

I did go into this with the niggling feeling that something bad might happen, so I made sure all my backups were up to date before I started. No biggie if it does die.

The very odd thing is I did a complete array check, plus SMART long tests on all drives literally hours before I started the reshape. Goes to show how ropey these large drives can be in big(iash) arrays.

> If anything there doesn't make sense, or doesn't seem to work - please ask.
>
> Thanks for the report.  I'll try to get those mdadm issues addressed -
> particularly if you can get me the mdadm file which caused the segfault.
>

Well, luckily I preserved the entire build tree then. I was planning on running nm over the binary and have a two thumbs type of look into it with gdb, but seeing as you probably have a much better idea what you are looking for I'll just send you the binary!

Thanks for the help Neil. Much appreciated.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux