Re: What the heck happened to my array?

NeilBrown <neilb@xxxxxxx> · Tue, 5 Apr 2011 16:10:43 +1000

On Tue, 05 Apr 2011 08:47:16 +0800 Brad Campbell <lists2009@xxxxxxxxxxxxxxx>
wrote:

> On 05/04/11 00:49, Roberto Spadim wrote:
> > i don´t know but this happened with me on a hp server, with linux
> > 2,6,37 i changed kernel to a older release and the problem ended,
> > check with neil and others md guys what´s the real problem
> > maybe realtime module and others changes inside kernel are the
> > problem, maybe not...
> > just a quick solution idea: try a older kernel
> >
> 
> Quick precis:
> - Started reshape 512k to 64k chunk size.
> - sdd got bad sector and was kicked.
> - Array froze all IO.

That .... shouldn't happen.  But I know why it did.

mdadm forks and runs in the back ground monitoring the reshape.
It suspends IO to a region of the array, backs up the data, then lets the
reshape progress over that region, then invalidates the backup and allows IO
to resume, then moves on to the next region (it actually have two regions in
different states at the same time, but you get the idea).

If the device failed the reshape in the kernel aborted and then restarted.
It is meant to do this - restore to a known state, then decide if there is
anything useful to do.  It restarts exactly where it left off so all should
be fine.

mdadm periodically checks the value in 'sync_completed' to see how far the
reshape has progressed to know if it can move on.
If it checks while the reshape is temporarily aborted it sees 'none', which
is not a number, so it aborts.  That should be fixed.
It aborts with IO to a region still suspended so it is very possible for IO
to freeze if anything is destined for that region.

> - Reboot required to get system back.
> - Restarted reshape with 9 drives.
> - sdl suffered IO error and was kicked

Very sad.

> - Array froze all IO.

Same thing...

> - Reboot required to get system back.
> - Array will no longer mount with 8/10 drives.
> - Mdadm 3.1.5 segfaults when trying to start reshape.

Don't know why it would have done that... I cannot reproduce it easily.

>    Naively tried to run it under gdb to get a backtrace but was unable 
> to stop it forking

Yes, tricky .... an "strace -o /tmp/file -f mdadm ...." might have been
enough, but to late to worry about that now.

> - Got array started with mdadm 3.2.1
> - Attempted to re-add sdd/sdl (now marked as spares)

Hmm... it isn't meant to do that any more.  I thought I fixed it so that it
if a device looked like part of the array it wouldn't add it as a spare...
Obviously that didn't work.  I'd better look in to it again.

> [  304.393245] mdadm[5940]: segfault at 7f2000 ip 00000000004480d2 sp 
> 00007fffa04777b8 error 4 in mdadm[400000+64000]
> 

If you have the exact mdadm binary that caused this segfault we should be
able to figure out what instruction was at 0004480d2.   If you don't feel up
to it, could you please email me the file privately and I'll have a look.

> root@srv:~/mdadm-3.1.5# uname -a
> Linux srv 2.6.38 #19 SMP Wed Mar 23 09:57:05 WST 2011 x86_64 GNU/Linux
> 
> Now. The array restarted with mdadm 3.2.1, but of course its now 
> reshaping 8 out of 10 disks, has no redundancy and is going at 600k/s 
> which will take over 10 days. Is there anything I can do to give it some 
> redundancy while it completes or am I better to copy the data off, blow 
> it away and start again? All the important stuff is backed up anyway, I 
> just wanted to avoid restoring 8TB from backup if I could.

No, you cannot give it extra redundancy.
I would suggest:
  copy anything that you need off, just in case - if you can.

  Kill the mdadm that is running in the back ground.  This will mean that
  if the machine crashes your array will be corrupted, but you are thinking
  of rebuilding it any, so that isn't the end of the world.
  In /sys/block/md0/md
     cat suspend_hi > suspend_lo
     cat component_size > sync_max

  That will allow the reshape to continue without any backup.  It will be
  much faster (but less safe, as I said).

  If the reshape completes without incident, it will start recovering to the
  two 'spares' - and then you will have a happy array again.

  If something goes wrong, you will need to scrap the array, recreate it, and
  copy data back from where-ever you copied it to (or backups).

If anything there doesn't make sense, or doesn't seem to work - please ask.

Thanks for the report.  I'll try to get those mdadm issues addressed -
particularly if you can get me the mdadm file which caused the segfault.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html