Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken

Rudy Zijlstra <rudy@xxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 13 May 2011 21:01:35 +0200

Hi Chris,

I've run paritioned MD disks for several years now. I do that on systems 
where i use md for the system partitions. One mirror with partitions for 
the different system aspects. I prefer that, as it reflects best the 
actual physical configuration, and all partitions will be degraded at 
the same time when 1 disk develops a problem (which is unfortunately not 
the case when you partition the disk and then mirror the partitions).

As i am a bit lazy and have only limited wish to fight with 
BIOS/bootloader conflicts / vagaries, these systems typically boot from 
the network (kernel gets loaded from the network, from there onwards all 
is on the local disk).

Cheers,

Rudy

On 05/13/2011 08:54 PM, Christopher White wrote:
Hello again Phil (and Roman). Thanks to your back-and-forth, the bug 
has now finally been completely narrowed down: It is a bug in (g)parted!

The issue is that (g)parted doesn't properly call the kernel API for 
re-scanning the device when you operate on md disks as compared to 
physical disks.

Your information (Phil) that (g)parted chokes on an assertion is good 
information for when I report this bug. It's not impossible that you 
must handle md-disks differently from physical disks and that 
(g)parted is not aware of that distinction, therefore choking on the 
partition table rescan API.

Either way, this is fantastic news, because it means it's not an md 
kernel bug, where waiting for a fix would have severely pushed back my 
current project. I'm glad it was simply (g)parted failing to tell the 
kernel to re-read the partition tables.

---

With this bug out of the way (I'll be reporting it to parted's mailing 
list now),one thing that's been bugging me during my hours of research 
is that the vast majority of users use either a single, large RAID 
array and virtually partition that with LVM, or alternatively breaking 
each disk into many small partitions and making multiple smaller 
arrays out of those partitions. Very few people seem to use md's 
built-in support for partitionable raid arrays.

This makes me a tiny bit wary to trust the stability of md's 
partitionable implementation, even though I suspect it is rock solid. 
I suspect the reason that most people don't use the feature is for 
legacy/habit reasons, since md used to support only a single 
partition, so there's avast amount of guides telling people to use 
LVM. Do any of you know anything about this and can advise on whether 
I should go for a single-partition MD array with LVM, or a 
partitionable MD array?

As far as performance goes, the CPU overhead of LVM is in the 1-5% 
range from what I've heard, and I have zero need for the other 
features LVM provides (snapshots, backups, online resizing, clusters 
of disks acting as one disk, etc), so it just feels completely 
overkill and worthless when all I need is a single, partitionable RAID 
array.

All I need is the ability to (in the future) add more disks to the 
array, grow the array, and then resize+move the partitions around 
using regular partitioning tools treating the RAID array as a single 
disk, and md's partitionable arrays support doing this since they act 
as a disk, where if you add more hard disks to your array; the 
available, unallocated space on that array simply grows and partitions 
on it can be expanded and relocated to take advantage of this. I don't 
need LVM for any of that, as long as md's implementation is stable.

Christopher

On 5/13/11 8:18 PM, Phil Turmel wrote:
Hi Christopher,

On 05/13/2011 02:04 PM, Christopher White wrote:
On 5/13/11 7:40 PM, Roman Mamedov wrote:
On Fri, 13 May 2011 19:32:23 +0200
Christopher White<linux@xxxxxxxxxxxxxx>   wrote:

I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
creating two partitions that way. It fails too.

This leads me to conclude that /dev/md1 was never created in
partitionable mode and that the kernel refuses to create anything 
beyond
a single partition on it.
Did you try running "blockdev --rereadpt /dev/md1"?

Hmm. Hmmmm. One more for good measure: Hmmmmmmm.

That's weird! Here's the thing: Fdisk is *just* for creating the 
partitions, not formatting them, so for that one it makes sense that 
you must re-read the partition table before you have a partition 
device to execute "mkfs.XXX" on.

However, Gparted on the other hand is BOTH for creating partition 
tables AND for executing the "make filesystem" commands 
(formatting). Therefore, Gparted is supposed to tell the kernel 
about partition table changes BEFORE trying to access the partitions 
it just created. Basically, Gparted goes: Blank disk, create 
partition table, create partitions, notify OS to re-scan the table, 
THEN access the new partition devices and format them. But instead, 
it skips the "notify OS" part when working with md-arrays!

When you use Gparted on PHYSICAL hard disks, it properly creates the 
partition table and the OS is updated to immediately see the new 
partition devices, to allow them to be formatted.
Indeed.  I suspect (g)parted is in fact requesting a rescan, but is 
being ignored.

I just tried this on one of my servers, and parted (v2.3) choked on 
an assertion.  Hmm.

Therefore, what this has shown is that the necessary procedure in 
Gparted is:
* sudo gparted /dev/md1
* Create the partition table (gpt for instance)
* Create as many partitions as you need BUT SET THEIR TYPE TO 
"unformatted" (extremely important).
* Go back to a terminal and execute "sudo blockdev --rereadpt 
/dev/md1" to let the kernel see the new partition devices
* Now go back to the Gparted and format the partitions, or just do 
it the CLI way with mkfs.ext4 manually. Either way, it will now work.

So how should we sum up this problem? Well, that depends. What is 
responsible for auto-discovering the new partitions when you use 
Gparted on a PHYSICAL disk (which works perfectly without manual 
re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, 
or 2) is it the kernel that auto-watches physical disks for changes?
Generally, udev does it.  But based on my little test, I suspect 
parted is at fault.  fdisk did just fine.

If 1), it means Gparted needs a bug fix to tell the kernel to 
re-scan the partition table for md-arrays when you re-partition them.
If 2), it means the kernel doesn't watch md-arrays for partition 
table changes, which debatably it should be doing.
What is ignored or acted upon is decided by udev rules, as far as I 
know.  You might want to monitor udev events while running some of 
your tests (physical disk vs. MD).

Thoughts?
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html