On 25/03/2010 00:35, Neil Brown wrote:
Greetings. I find myself in the middle of two separate off-list conversations on the same topic and it has reached the point where I think the conversations really need to be unite and brought on-list. So here is my current understanding and thoughts. The topic is about making rebuild after a failure easier. It strikes me as particularly relevant after the link Bill Davidsen recently forwards to the list: http://blogs.techrepublic.com.com/opensource/?p=1368 The most significant thing I got from this was a complain in the comments that managing md raid was too complex and hence error-prone. I see the issue as breaking down in to two parts. 1/ When a device is hot plugged into the system, is md allowed to use it as a spare for recovery? 2/ If md has a spare device, what set of arrays can it be used in if needed. A typical hot plug event will need to address both of these questions in turn before recovery actually starts. Part 1. A newly hotplugged device may have metadata for RAID (0.90, 1.x, IMSM, DDF, other vendor metadata) or LVM or a filesystem. It might have a partition table which could be subordinate to or super-ordinate to other metadata. (i.e. RAID in partitions, or partitions in RAID). The metadata may or may not be stale. It may or may not match - either strongly or weakly - metadata on devices in currently active arrays.
Or indeed it may have no metadata at all - it may be a fresh disc. I didn't see that you stated this specifically at any point, though it was there by implication, so I will: you're going to have to pick up hotplug events for bare drives, which presumably means you'll also get events for CD-ROM drives, USB sticks, printers with media card slots in them etc.
A newly hotplugged device also has a "path" which we can see in /dev/disk/by-path. This is somehow indicative of a physical location. This path may be the same as the path of a device which was recently removed. It might be one of a set of paths which make up a "RAID chassis". It might be one of a set of paths one which we happen to find other RAID arrays.
Indeed, I would like to be able to declare any /dev/disk/by-path/pci-0000:00:1f.2-scsi-[0-4] to be suitable candidates for hot-plugging, because those are the 5 motherboard SATA ports I've hooked into my hot-swap chassis.
As an aside, I just tried yanking and replugging one of my drives, on CentOS 5.4, and it successfully went away and came back again, but wasn't automatically re-added, even though the metadata etc was all there.
Some how from all of that information we need to decide if md can use the device without asking, or possibly with a simple yes/no question, and we need to decide what to actually do with the device. Options for what to do with the device include: - write an MBR and partition table, then do something as below with each partition
Definitely want this for bare drives. In my case I'd like the MBR and first 62 sectors copied from one of the live drives, or a copy saved for the purpose, so the disc can be bootable.
My concern is that this is surely outwith the regular scope of mdadm/mdmon, as is handling bare drives/CD-ROMs/USB sticks. Do we need another mdadm companion rather than an addition?
- include the device (or partition) in an array that it was previously part of, but from which it was removed
Definitely, just so I can pull a drive and plug it in again and point and say ooh, everything's up and running again, to demonstrate how cool Linux md is. I imagine some distros' udev/hotplug rules do this already, almost by default where they assemble arrays incrementally.
- include the device or partition as a spare in a native-metadata array.
I think in my situation I'd quite like the first partition, type fd metadata 0.90 RAID-1 mounted as /boot, added as an active mirror not a spare, again so that if this new drive appears as sda at the next power cycle, the system will boot.
The second partition, a RAID-5 with LVM on it, could be added as a spare, because it would then automatically be rebuilt onto if the array was degraded.
Part 2.
[...] I'm afraid I have nothing to add here, it all sounds good. Cheers, John. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html