Re: mdadm 3.3 fails to kick out non fresh disk

Francis Moreau <francis.moro@xxxxxxxxx> · Fri, 27 Sep 2013 17:47:05 +0200

On Fri, Sep 27, 2013 at 10:26 AM, Francis Moreau <francis.moro@xxxxxxxxx> wrote:
> Hello Martin,
>
> Sorry for the late answer, I was busy with some other stuff.
>
> On Mon, Sep 23, 2013 at 10:02 PM, Martin Wilck <mwilck@xxxxxxxx> wrote:
>> On 09/21/2013 03:22 PM, Francis Moreau wrote:
>>> On Fri, Sep 20, 2013 at 11:08 PM, Francis Moreau <francis.moro@xxxxxxxxx> wrote:
>>>> Hello Martin,
>>>>
>>>> On Fri, Sep 20, 2013 at 8:07 PM, Martin Wilck <mwilck@xxxxxxxx> wrote:
>>>>> On 09/20/2013 10:56 AM, Francis Moreau wrote:
>>>>>> Hello Martin,
>>>>>>
>>>>>> On Mon, Sep 16, 2013 at 7:04 PM, Martin Wilck <mwilck@xxxxxxxx> wrote:
>>>>>>> On 09/16/2013 03:56 PM, Francis Moreau wrote:
>>>>>>>
>>>>>>>> I did give your patch "DDF: compare_super_ddf: fix sequence number
>>>>>>>> check" a try and now mdadm is able to detect a difference between the
>>>>>>>> 2 disks. Therefore it refuses to insert the second disk which is
>>>>>>>> better.
>>>>>>>>
>>>>>>>> However it's still not able to detect which version is the "fresher"
>>>>>>>> like mdadm does with soft RAID1 (metadata 1.2). Therefore mdadm is not
>>>>>>>> able to kick out the first disk if it's the outdated one.
>>>>>>>>
>>>>>>>> Is that expected ?
>>>>>>>
>>>>>>> At the moment, yes. This needs work.
>>>>>>>
>>>>>>
>>>>>> Actually this is worse than I thought: with your patch applied mdadm
>>>>>> refuses to add back a spare disk into a degraded DDF array.
>>>>>>
>>>>>> For example on a DDF array:
>>>>>>
>>>>>> # cat /proc/mdstat
>>>>>> Personalities : [raid1]
>>>>>> md126 : active raid1 sdb[1] sda[0]
>>>>>>       2064384 blocks super external:/md127/0 [2/2] [UU]
>>>>>>
>>>>>> md127 : inactive sdb[1](S) sda[0](S)
>>>>>>       65536 blocks super external:ddf
>>>>>>
>>>>>> unused devices: <none>
>>>>>>
>>>>>> # mdadm /dev/md126 --fail sdb
>>>>>> [   24.118434] md/raid1:md126: Disk failure on sdb, disabling device.
>>>>>> [   24.118437] md/raid1:md126: Operation continuing on 1 devices.
>>>>>> mdadm: set sdb faulty in /dev/md126
>>>>>>
>>>>>> # mdadm /dev/md127 --remove sdb
>>>>>> mdadm: hot removed sdb from /dev/md127
>>>>>>
>>>>>> # mdadm /dev/md127 --add /dev/sdb
>>>>>> mdadm: added /dev/sdb
>>>>>>
>>>>>> # cat /proc/mdstat
>>>>>> Personalities : [raid1]
>>>>>> md126 : active raid1 sda[0]
>>>>>>       2064384 blocks super external:/md127/0 [2/1] [U_]
>>>>>>
>>>>>> md127 : inactive sdb[1](S) sda[0](S)
>>>>>>       65536 blocks super external:ddf
>>>>>>
>>>>>> unused devices: <none>
>>>>>>
>>>>>>
>>>>>> As you can see the reinserted disk sdb sits as spare and isn't added
>>>>>> back to the array.
>>>>>
>>>>> That's correct. You marked that disk failed.
>>>>>
>>>>>> Is it possible to add this major feature work again and keep your improvement ?
>>>>>
>>>>> No. A failed disk can't be added again without rebuild. I am positive
>>>>> about that.
>>>>>
>>>>
>>>> Hmm that's not the case with soft linux RAID AFAICS: doing the same
>>>> thing with soft RAID and the reinserted disk is added to the raid
>>>> array and it's synchronised automatically. You can try it easily.
>>>
>>
>> Sorry, I didn't read your problem description carefully enough. You used
>> mdadm --add, and that should work and should trigger a rebuild, as you said.
>>
>>> BTW, that's also the case for DDF if I don't apply your patch.
>>
>> I don't understand this. My patch doesn't change the behavior of "mdadm
>> --add". AFAICS compare_super() isn't called in that code path.
>>
>> I just posted two unit tests that cover this use (or better: failure)
>> case, please verify that they meet your scenario.
>>
>> On my system, with my latest patch, these tests are successful.
>>
>> I also tried a VM, as you suggested, and did exactly what you described,
>> successfully. After failing/removing one disk and rebooting, the system
>> comes up degraded; mdadm -I the old disk fails (that's correct), but I
>> can mdadm --add the old disk and recovery starts automatically. So all
>> is fine - the question is why it doesn't work on your system.
>
> Maybe the kernel is different ? I'm using 3.4.62.
>
>>
>>> Additionnal information: looking at sda shows that it doesn't seem to
>>> have metadata anymore after having added it to the container:
>>>
>>> # mdadm -E /dev/sda
>>> /dev/sda:
>>>    MBR Magic : aa55
>>> Partition[0] :      3564382 sectors at         2048 (type 83)
>>> Partition[1] :       559062 sectors at      3569643 (type 05)
>>
>> I wonder if this gives us a clue. It seems that something erased the
>> meta data. I can't imagine that mdadm did that. I wonder if that could
>> have been your BIOS. Pretty certainly it wasn't mdadm. However mdadm
>> --add should work, even if the BIOS had changed something on the disk. I
>> admit I'm clueless here.
>>
>> In order to make progress, we'd need mdadm -E output of both disks
>> before and after the BIOS gets to write them, after boot, and after your
>> trying mdadm --add. The mdmon logs would also be highly appreciated, but
>> they'll probably hard for you to generate. You need to compile mdmon
>> with CXFLAGS="-DDEBUG=1 -g" and make sure mdmon's stderr os captured
>> somewhere.
>
> I'm not sure why you're talking about the BIOS here... my VM hasn't
> been rebooted during the tests described above. BTW I'm using qemu to
> run my VM.

I finally found my issue: mdmon --takeover service wasn't started
anymore (probably I messed it up earlier). Therefore mdmon started by
initrd was used and wasn't working properly.

If you still want me to test something, please tell me.

OTHO, it would be easier if you setup a git tree somewhere with your
patches that you want me to test. BTW I'm not subscribed to linux-raid
mailing list.

Thanks.
-- 
Francis
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html