Re: regression: drive was detected as raid member due to metadata on partition

Sven Köhler <sven.koehler@xxxxxxxxx> · Wed, 29 May 2024 00:57:17 +0200

Hi Mariusz,

Am 07.05.24 um 09:32 schrieb Mariusz Tkaczyk:
On Tue, 9 Apr 2024 01:31:35 +0200
Sven Köhler <sven.koehler@xxxxxxxxx> wrote:

I strongly believe that mdadm should ignore any metadata - regardless of
the version - that is at a location owned by any of the partitions.

That would require mdadm to understand gpt parttable, not only clone it.
We have gpt support to clone the gpt metadata( see super-gpt.c).
It should save us from such issues so you have my ack if you want to do this.

I get your point. That seems wrong to me. I wonder whether the kernel 
has some interface to gather information on partitions on a device. 
After all, the kernel knows lots of partition table types (mbr, gpt, ...)

But... GPT should have secondary header located at the end of the device, so
your metadata should be not at the end. Are you using gpt or mbr parttable?
Maybe missing secondary gpt header is the reason?

I just checked. My disks don't have a GPT backup at the end. I might 
have converted an MBR partition table to a GPT. That would not create a 
backup GPT if the space is already occupied by a partition.

That said, for the sake of argument, I might just as well be using an 
MBR partition table.

While I'm not 100% sure how to implement that, the following might also
work: first scan the partitions for metadata, then ignore if the parent
device has metadata with a UUID previously found.

No, it is not an option. In udev world, you should only operate on device you
are processing so we should avoid referencing the system.

Hmm, I think I know what you mean.

BTW. To avoid this issue you can left few bytes empty at the end of disk, simply
make your last partition ended few bytes before end of the drive. With that
metadata will not be recognized directly on the drive. That is at least what I
expected but I'm not native experienced so please be aware of that.

I verified that my last partition ends at the last sector of the disc. 
Pretty sure that means it must have been an MBR PT once upon a time.

This is not about me. I'm not asking to support my case for the sake of 
having my system work. I already converted to metadata 1.2 and that 
fixed the issue regardless where the last partition ends.

It's a regression, in the sense that my system has worked for years and 
after an upgrade suddenly didn't. I'd like to prevent that the same 
happens to others. It was pretty scary, even though no data seems to 
have been lost.

I did the right thing and converted my RAID arrays to metadata 1.2, but
I'd like to save other from the adrenaline shock.

There are reasons why we introduced v1.2 located at the begging of device.
You can try to fix it but I think that you should just follow upstream and
choose 1.2 if you can.

Yes, I agree with you. That's why I migrated to 1.2 already.

As we are more and more with 1.2 that naturally we care less about 0.9,
especially of workarounds in other utilities. We cannot control
if legacy workarounds are still there (the root cause of this change may be
outside md/mdadm, you never know :)).

Likely, the reason is outside of the mdadm binary but inside the mdadm 
repo. Arch Linux uses the udev rules provided by the mdadm package 
without modification. The diff on the udev rules between mdadm 4.2 and 
4.3 release is significant. Both invoke mdadm -If $name but likely the 
order has changed.

An investigation of that is still pending. I'm not an expert in udev 
debugging, and the logs don't show.

So the cases like that will always come. It is right to use 1.2 now to be
better supported if you don't have strong need to stay with 0.9.

Would it be possible to have automated tests for incremental raid 
assembly via udev rules? I'm not an expert in udev though.

Anyway, patches are always welcomed!

Still working on my udev debugging skills. But afterwards, I may very 
well prepare a patch.

Best,
  Sven