Re: want-replacement got stuck?

"George Spelvin" <linux@xxxxxxxxxxx> · 21 Nov 2012 22:25:04 -0500

Some more information...

>From the "stuck" state, I rebooted the machine.  It came up with 

md5 : active raid10 sde2[2] sdd2[3] sda2[0] sdb2[1]
      725591552 blocks 256K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 172/173 pages [688KB], 2048KB chunk

and e2fsck found severe problems, like multiply-referenced blocks.

I compared sdd2 and sde2 with cmp, and it found tons of
differences.  So I knew what the problem was.  All I havd to do
was pick the right one to fail.

Fortunately, I had the last RAID config on the screen of the
machine I had sshed in from, and decided I truested sdd2 less,
so failed it.

After flushing the device cache (hdparm -f /dev/md5), the errors
went away!  I was left with only what the original e2fsck -p had done
before halting.  (Namely. some updates to i_blocks).

Now I've zeroed sdd2's uperblock and added it back, and things seem
to be working okay.

NeilBrown <neilb@xxxxxxx> wrote:
> Yes.... this is a real worry.  Fortunately I know what is causing it.

Yay!  Tell me when you have a patch to test.

> Meanwhile you have a corrupted filesystem.  Sorry.
> The nature of the corruption is that since the replacement finished
> no writes have gone to slot-3 at all.  So if md ever devices to read
> from slot 3 it will get stale data.

That's sort of what the pattern of errors looked like.

> I suggest you fail the sdd2, reboot, make sure one sda2, sb2, sde2 are
> in the array, run fsck, and then if it seems happy enough, add sdc2
> and/or sdd2 back in so they rebuild completely.

I did this in a sort of bass-ackward way, but I accomplished it in
the end.  And no data loss.  Yippee!

> Thanks for helping to make md better by risking your data :-)
I'm just glad I suffered less damage than my recent ext4 resizing
experiments, which were.... not completely successful.

Anyway, thanks for the help, and all the hard work.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html