Re: RFC: mdadm and bringing up raid sets from initrd (dracut)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday July 15, dan.j.williams@xxxxxxxxx wrote:
> [ Cc: Neil ]
> 
> On Tue, Jul 14, 2009 at 7:30 AM, David Zeuthen<david@xxxxxxxx> wrote:
> > On Tue, 2009-07-14 at 12:59 +0200, Hans de Goede wrote:
> >> Currently the udev rules use incremental assembly like this:
> >> mdadm -I /dev/mdraid-member
> >>
> >> There are 2 problems with this:
> >> 1) When doing this for native mdraid metadata arrays, if only
> >>     one disk is present the set never gets activated
> >> 2) When doing this for imsm metadata arrays, as soon as the
> >>     first disk is incrementally added, the set gets activated
> >>     in degraded mode and stays that way, the second disk
> >>     will get added to the container, but not to the actual
> >>     sets in the container
> >
> > FWIW, this incremental assembly business in mdadm is actually not a very
> > good idea. At least not the current implementation. I'm not sure whether
> > it's still a Fedora-ism or whether it's something that's in upstream
> > mdadm yet. I'm talking about this udev rule
> >
> >  /lib/udev/rules.d/65-md-incremental.rules:
> >  # This file causes block devices with Linux RAID (mdadm) signatures to
> >  # automatically cause mdadm to be run.
> >  # See udev(8) for syntax
> >
> >  SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="linux_raid_member", \
> >        IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
> >        RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I $env{DEVNAME}'"
> >
> > For example if the user plugs in a random old disk that happens to
> > contain half of a RAID1 mirror, then the incremental assembly bits sets
> > up an inert md-device and the user is now left to his own devices as to
> > sort this out when he's told by partitioning tools etc. that the disk
> > (or partition of) he just plugged in, is "busy" (it is claimed by the
> > inert md node).
> >
> > I actually had to add some extra code to the GNOME Disk Utility bits to
> > handle such things (stop inert md devices) - makes the user experience
> > quite a bit worse since there's now an extra state to worry about. And
> > most current users don't use the UI bits yet for this so they get extra
> > confused when trying to use e.g. parted(8) or fdisk(8) on the device.
> >
> > FWIW, I'd wish people would stop playing games like this. If you want to
> > do auto-assembly at the system-level, at the very least don't leave the
> > system in a state like this. For example, one way to do auto-assembly
> > without such bugs would be to use libudev to enumerate all md component
> > devices with the same MD_UUID. Then you count the number of components
> > and only start the array if the number of components equals MD_DEVICES.
> > That's much better than incrementally adding to an md device node that
> > might never get used.

Yes:  auto-assembly is hard, and easy to get wrong.

While I don't claim that the current scheme is at all perfect, I don't
think your suggestion is a clear improvement.
The whole point of RAID is to survive drive failure, and that includes
drives being missing.
So I don't think "completely ignore the array if not all expected
drives are present" is the correct answer.

It is very easy to remove unwanted raid metadata 
(mdadm --zero-superblock), and making that easily accessible from a
GUI would probably be a good and useful thing, and might solve some
problems for some people.

One thing that I have contemplated is for md to not claim exclusive
ownership of drives until the array is activated and switch to
read-write.  That would address the 'my drive was stolen by md'
problem, but it may well create other problems in its place.

My general goal at present is to make mdadm sufficiently flexible that
a distro can choose a suitable policy implement it.  If someone comes
up with a policy that works convincingly well, I could then make that
the default approach that mdadm takes.
There is certainly still room for improvement and I am happy to
discuss possibilities.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux