Re: 4.1-rc6 radi5 OOPS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 03 Jun 2015 17:57:43 -0400
Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> wrote:

> NeilBrown <neilb@xxxxxxx> writes:
> > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen
> > <Jes.Sorensen@xxxxxxxxxx> wrote:
> >
> >> Neil,
> >> 
> >> I was running testing on the current 4.1-rc6 tree (Linus' top of
> >> trunk 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing
> >> the following OOPS which is reproducible.
> >> 
> >> It shows up when running the mdadm test suite, 07changelevelintr
> >> to be specific.
> >> 
> >> Is this something you have seen?
> >> 
> >> Cheers,
> >> Jes
> >> 
> >> ------------[ cut here ]------------
> >> kernel BUG at drivers/md/raid5.c:5391!
> >
> > No, I haven't seen that.  And I've been running the test suite
> > quite a bit lately.
> >
> > Can you get it to print out the relevant numbers?  Include
> > readpos/writepos/safepos too.
> 
> This enough? Let me know if you need more.
> 
> I suspect this started happening with the changes that went in between
> 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow.
> 
> Cheers,
> Jes
> 
> mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr:
> fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos
> 512  

These numbers suggest that conf->reshape_progress divided by
"data_disks" or "new_data_disks" is -1 - or really the unsigned
equivalent, which is MaxSectors.
But unless data_disks is 1, ->reshape_progress must really be -2 or -3
or something.
So maybe if you could confirm the values of ->reshape_progress,
data_disks, and new_data_disks, that might help.


I don't think ->reshape_progress could get a negative value in any way
except by being assigned MaxSectors.  And that only happens when the
reshape has really completely finished.

So it looks like some sort of race.  I have other evidence of a race
with the resync/reshape thread starting/stopping.  If I track that
down it'll probably fix this issue too.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux