Re: 4.1-rc6 radi5 OOPS

Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> · Wed, 10 Jun 2015 12:27:35 -0400

Neil Brown <neilb@xxxxxxx> writes:
> On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@xxxxxxx> wrote:
>
>> So it looks like some sort of race.  I have other evidence of a race
>> with the resync/reshape thread starting/stopping.  If I track that
>> down it'll probably fix this issue too.
>
> I think I have found just such a race.  If you request a reshape just
> as a recovery completes, you can end up with two reshapes running.
> This causes confusion :-)
>
> Can you try this patch?  If I can remember how to reproduce my race
> I'll test it on that too.
>
> Thanks,
> NeilBrown

Hi Neil,

Thanks for the patch - I tried with this applied, but it still crashed
for me :( I had to mangle it manually, somehow it got modified in the
email.

Note this was a mangled RHEL kernel, but it's the same crash I see on
the upstream kernel.

[  754.303561] md: using 128k window, over a total of 19456k.
[  754.309706] mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos 512
[  754.324486] ------------[ cut here ]------------
[  754.329649] kernel BUG at drivers/md/raid5.c:5388!

Cheers,
Jes
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html