Re: BUG?: RAID6 reshape hung in reshape_request

NeilBrown <neilb@xxxxxxx> · Wed, 29 Apr 2015 10:03:39 +1000

On Mon, 27 Apr 2015 12:20:50 -0500 David Wahler <dwahler@xxxxxxxxx> wrote:
> 
> I don't urgently need this array up and running, so I'm happy to leave
> it in its current state for the next few days in case there's anything
> else I can do to help track this down.

Thanks for the various status data.

I'm fairly easily able to reproduce the problem.  I clearly never thought
about 'reshape' when I was writing the bad_block handling.

You can allow the reshape to complete by the following hack:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 77dfd720aaa0..e6c68a450d4c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4306,7 +4306,7 @@ static void handle_stripe(struct stripe_head *sh)
 	 */
 	if (s.failed > conf->max_degraded) {
 		sh->check_state = 0;
-		sh->reconstruct_state = 0;
+//		sh->reconstruct_state = 0;
 		if (s.to_read+s.to_write+s.written)
 			handle_failed_stripe(conf, sh, &s, disks, &s.return_bi);
 		if (s.syncing + s.replacing)

It may not necessarily do exactly the right thing, but it won't be too bad.

I'm tempted to simply disable reshapes if there are bad blocks, but that
might not be necessary.

The presence of a 'bad block' can mean two things.

1/ The data is missing.  If there are enough bad blocks in a stripe then
   some data cannot be recovered.  In that case we can only let the 'grow'
   proceed if we record the destination blocks as 'bad', which isn't too hard.

2/ The media is faulty and writes fail.  A 'bad block' doesn't always mean
   this, but it can and it is hard to know if it does or not.
   This case only really matters when writing.  I could probably just
   over-write anyway and handle failure as we normally would.
   If the 'write' succeeds, I need to clear the 'bad block' record, but I
   think I do that anyway.

So I should be able to make it work.  I'll probably get mdadm to warn
strongly against reshaping an array with bad blocks though.

I'm going to have to study the code some more.

NeilBrown
Attachment:
pgpeVtQUuZOSp.pgp

Description: OpenPGP digital signature