On Tue, 2008-10-28 at 10:36 +1100, Neil Brown wrote: > On Monday October 27, dledford@xxxxxxxxxx wrote: > > > > I've found the udev rules method of starting md devices to be > > problematic (at best). > > > > Here's the issue (in Fedora at least). Starting devices via udev means > > starting them as soon as they are capable and not waiting until all > > devices are up and running. You have to do this in case the device is > > in a degraded state and you aren't going to get all the devices. > > However, we don't create a bitmap on devices by default in the installer > > (a user can add one themselves, but it isn't there by default). Without > > the bitmap, if the device is written to before all devices are added, it > > triggers a full resync of the device. As it turns out, for certain > > installations, this happens on *every* single reboot. It's painful, to > > say the least. So, I wanted to change the udev rule to work slightly > > differently. I wanted the invocation of mdadm --incremental that > > happened to be the one that took the array from an unrunable state to a > > runable but degraded state to sleep for say 2 to 5 seconds, and then if > > the array is still not up and running due to subsequent udev rule > > invocations, it would start the array in a degraded state. This, > > however, breaks udevsettle. So, the current setup (for the upcoming > > fedora 10) is done such that the udev rule won't start any degraded > > arrays, and instead we have both a specific mdadm invocation in the > > initrd and another in rc.sysinit that will start any degraded arrays > > that are also listed in the mdadm.conf file. This makes sure that known > > arrays are assembled and started if at all possible, but we only start > > unknown arrays if they are complete. > > > > This is using udev to start md devices, which is not quite the focus > of the previous discussion. That was more about using udev to create > the entries in /dev when someone else started the arrays. True enough, although I think they are a bit related simply because it's udev rules on block devices that trigger the mdadm -I invocations that trigger the new mdadm devices, so the issue of creating devices from udev rules is at least mildly related to how mdadm gets called in the first place, especially for the issue of hot plugging as you brought up in your mail as hot plugging is specifically a case of udev kicking mdadm off. > However this is still a real issue that I would like to handle as best > we can. > > I would like to get the md code to always have at least an in-memory > bitmap to allow quite resync after a "re-add". > > However even this isn't a perfect solution as there is a window when a > single device failure can kill an array. > > Your solution sounds good, but I'd be happy to hear other thoughts on > the issue. I ended up coding this up. It took quite a bit more touchup in the incremental code than I expected. In general, mdadm-2.6.7.1 doesn't do a very good job of honoring information in /etc/mdadm.conf when doing incremental assembly. So, momentarily I'll send you a patch series/pull request with the changes. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: This is a digitally signed message part