Re: Auto Rebuild on hot-plug

John Robinson <john.robinson@xxxxxxxxxxxxxxxx> · Thu, 25 Mar 2010 14:10:05 +0000

On 25/03/2010 00:35, Neil Brown wrote:
Greetings.
 I find myself in the middle of two separate off-list conversations on the
 same topic and it has reached the point where I think the conversations
 really need to be unite and brought on-list.

 So here is my current understanding and thoughts.

 The topic is about making rebuild after a failure easier.  It strikes me as
 particularly relevant after the link  Bill Davidsen recently forwards to the
 list:

       http://blogs.techrepublic.com.com/opensource/?p=1368

 The most significant thing I got from this was a complain in the comments
 that managing md raid was too complex and hence error-prone.

 I see the issue as breaking down in to two parts.
  1/ When a device is hot plugged into the system, is md allowed to use it as
     a spare for recovery?
  2/ If md has a spare device, what set of arrays can it be used in if needed.

 A typical hot plug event will need to address both of these questions in
 turn before recovery actually starts.

 Part 1.

  A newly hotplugged device may have metadata for RAID (0.90, 1.x, IMSM, DDF,
  other vendor metadata) or LVM or a filesystem.  It might have a partition
  table which could be subordinate to or super-ordinate to other metadata.
  (i.e. RAID in partitions, or partitions in RAID).  The metadata may or may
  not be stale.  It may or may not match - either strongly or weakly -
  metadata on devices in currently active arrays.

Or indeed it may have no metadata at all - it may be a fresh disc. I 
didn't see that you stated this specifically at any point, though it was 
there by implication, so I will: you're going to have to pick up hotplug 
events for bare drives, which presumably means you'll also get events 
for CD-ROM drives, USB sticks, printers with media card slots in them etc.

  A newly hotplugged device also has a "path" which we can see
  in /dev/disk/by-path.  This is somehow indicative of a physical location.
  This path may be the same as the path of a device which was recently
  removed.  It might be one of a set of paths which make up a "RAID chassis".
  It might be one of a set of paths one which we happen to find other RAID
  arrays.

Indeed, I would like to be able to declare any 
/dev/disk/by-path/pci-0000:00:1f.2-scsi-[0-4] to be suitable candidates 
for hot-plugging, because those are the 5 motherboard SATA ports I've 
hooked into my hot-swap chassis.

As an aside, I just tried yanking and replugging one of my drives, on 
CentOS 5.4, and it successfully went away and came back again, but 
wasn't automatically re-added, even though the metadata etc was all there.

  Some how from all of that information we need to decide if md can use the
  device without asking, or possibly with a simple yes/no question, and we
  need to decide what to actually do with the device.

  Options for what to do with the device include:
    - write an MBR and partition table, then do something as below with
      each partition

Definitely want this for bare drives. In my case I'd like the MBR and 
first 62 sectors copied from one of the live drives, or a copy saved for 
the purpose, so the disc can be bootable.

My concern is that this is surely outwith the regular scope of 
mdadm/mdmon, as is handling bare drives/CD-ROMs/USB sticks. Do we need 
another mdadm companion rather than an addition?

    - include the device (or partition) in an array that it was previously
      part of, but from which it was removed

Definitely, just so I can pull a drive and plug it in again and point 
and say ooh, everything's up and running again, to demonstrate how cool 
Linux md is. I imagine some distros' udev/hotplug rules do this already, 
almost by default where they assemble arrays incrementally.

    - include the device or partition as a spare in a native-metadata array.

I think in my situation I'd quite like the first partition, type fd 
metadata 0.90 RAID-1 mounted as /boot, added as an active mirror not a 
spare, again so that if this new drive appears as sda at the next power 
cycle, the system will boot.

The second partition, a RAID-5 with LVM on it, could be added as a 
spare, because it would then automatically be rebuilt onto if the array 
was degraded.

 Part 2.
[...]

I'm afraid I have nothing to add here, it all sounds good.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html