Re: raid6 rebuild not starting

Anssi Hannula <anssi.hannula@xxxxxx> · Mon, 12 Dec 2011 09:25:30 +0200

On Mon, Dec 12, 2011 at 9:10 AM, NeilBrown <neilb@xxxxxxx> wrote:
> On Mon, 12 Dec 2011 08:42:49 +0200 Anssi Hannula <anssi.hannula@xxxxxx> wrote:
>
>> On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@xxxxxx> wrote:
>> >
>> >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@xxxxxx> wrote:
>> >> >
>> >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@xxxxxx> wrote:
>> >> >> >
>> >> >> >> Hi!
>> >> >> >>
>> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
>> >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
>> >> >> >> on array activity.
>> >> >> >>
>> >> >> >> All the devices show up properly in --detail and two devices are marked
>> >> >> >> as "spare rebuilding", and I can access the contents of the array just
>> >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
>> >> >> >> missing something? :)
>> >> >> >>
>> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
>> >> >> >> the same issue. mdadm is 3.1.5.
>> >> >> >>
>> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a
>> >> >> >> rebuild either.
>> >> >> >>
>> >> >> >> Attached are --examine outputs before assembly, kernel log output on
>> >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>> >> >> >>
>> >> >> >
>> >> >> > Thank you for the very detailed problem report.
>> >> >>
>> >> >> Thanks for the quick response :)
>> >> >>
>> >> >> > Unfortunately it is a complete mystery to me what is happening.
>> >> >> >
>> >> >> > The repeated "RAID conf printout" messages are almost certainly coming from
>> >> >> > the end of raid5_remove_disk.
>> >> >> > It is being called from remove_and_add_spares for each of the two devices
>> >> >> > that are being rebuilt.  raid5_remove_disk declines to remove them because it
>> >> >> > can keep rebuilding them.
>> >> >> >
>> >> >> > remove_and_add_spares then counts them and notes there are 2.
>> >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run
>> >> >> > md_do_sync.
>> >> >> >
>> >> >> > md_do_sync should then print out a message like
>> >> >> >  md: recovery of RAID array md0
>> >> >> >
>> >> >> > but it doesn't.  So something went wrong.
>> >> >> > There are three reasons that md_do_sync might not print a message:
>> >> >> >
>> >> >> > 1/ MD_RECOVERY_DONE is set.  As only md_do_sync ever sets it, that is
>> >> >> >    unlikely, and in any case md_check_recovery clears it.
>> >> >> > 2/ mddev->ro != 0.  It is only ever set to 0, 1, or 2.  If it is 1 or 2
>> >> >> >   then we would be able to see that in /proc/mdstat as a "(readonly)"
>> >> >> >   status.  But we don't.
>> >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this.  It does
>> >> >> >   get set if kthread_should_stop() returns 'true', but that should only
>> >> >> >   happen if kthread_stop() was called.  That is only called by
>> >> >> >   md_unregister_thread and I cannot see any way that could be call.
>> >> >> >
>> >> >> > So.  No idea.
>> >> >> >
>> >> >> > Are you compiling these kernels yourself?
>> >> >>
>> >> >> Nope (used Mageia kernels), but I did now (3.1.5).
>> >> >>
>> >> >> > If so, could you:
>> >> >> >  - put a printk in the top of md_do_sync to report the values of
>> >> >> >   mddev->recovery and mddev->ro
>> >> >> >  - print a message whenever md_unregister_thread is called
>> >> >> >  - in md_check_recovery, in the
>> >> >> >                if (mddev->ro) {
>> >> >> >                        /* Only thing we do on a ro array is remove
>> >> >> >                         * failed devices.
>> >> >> >                         */
>> >> >> >                        mdk_rdev_t *rdev;
>> >> >> >
>> >> >> >  in statement, print the value of mddev->ro.
>> >> >> >
>> >> >> > Then see which of those printk's fire, and what they tell us.
>> >> >>
>> >> >> Only the last one does, and mddev->ro == 0.
>> >> >>
>> >> >> For reference, attached is the used patch and resulting log output.
>> >> >>
>> >> >
>> >> > Thanks.
>> >> >
>> >> > So it isn't running md_do_sync at all. Odd.
>> >> >
>> >> > Could please add:
>> >> >  - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
>> >> >    see who is calling it.
>> >> >  - print the value that remove_and_add_spares is going to return.
>> >>
>> >> Attached. As you can see, remove_and_add_spare returns 0.
>> >>
>> >> --
>> >> Anssi Hannula
>> >
>> >
>> > Please add:
>> >
>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>> > index 5c95ccb..fa56ac5 100644
>> > --- a/drivers/md/md.c
>> > +++ b/drivers/md/md.c
>> > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev)
>> >                        }
>> >                }
>> >
>> > +       printk("degraded=%d\n", mddev->degraded);
>> >        if (mddev->degraded) {
>> >                list_for_each_entry(rdev, &mddev->disks, same_set) {
>> > +                       printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags);
>> >                        if (rdev->raid_disk >= 0 &&
>> >                            !test_bit(In_sync, &rdev->flags) &&
>> >                            !test_bit(Faulty, &rdev->flags))
>> >
>> >
>> > 'degraded' must be 2 as dmesg contains
>> >
>> > [   45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
>> >
>> > and 'degraded' is exactly the difference between '8' and '10' there.
>> >
>> > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just
>> > show "spare rebuilding" in the 'detail' output.
>> >
>> > so remove_and_add_spares "must" return 2.
>> >
>> > Hopefully the above patch will help me understand which of those is wrong.
>>
>> The output is:
>> [   47.389379] md0: degraded=2
>> [   47.389380] md0: raid_disk=0 flags=4
>> [   47.389381] md0: raid_disk=-1 flags=0
>>
>> Full assemble log attached.
>>
>
> Bingo.
>
> This will fix it.   We don't really need that 'break' there, and it is a
> problem.
>
> Thanks.

Confirmed. Thanks for the quick fix :)

-- 
Anssi Hannula
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html