<background> "Events" are mechanism that device-mapper kernel targets use to signal user-space. An event can be raised for any number of reasons, including: a mirror becomes in-sync, an I/O error has occurred to a device in a mirror, a snapshot has become full - anything that may warrant user-space interest. User-space can wait on a DM device (using 'dmsetup wait <device> [<event_nr>]') for an event to take place; and once an event is received, take action. It is always prudent to check the status output of the DM device once an event is received to ensure you are taking the appropriate action. Since devices can raise an event for a variety of reasons, do not presuppose a particular reason for an event. There exists a daemon ('dmeventd') specifically designed to listen for events. Devices are registered with the daemon along with the name of a Dynamic Shared Object (aka DSO, aka runtime library) via a library interface - libdevmapper-event. The daemon "wait"s on the device for an event and uses the DSO to process it. For example, when LVM creates a mirror, it registers the mirror device with the daemon, specifying "libdevmapper-event-lvm2mirror.so" as the DSO to use for processing events. (If users don't like the way the default DSO handles events, they can even specify their own.) </background> Currently, the mirror DSO - libdevmapper-event-lvm2mirror.so - is limited in what it does. (Find the code in LVM2/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c) It will tell you when a mirror becomes "in-sync" and it will remove a device that suffers an I/O error - regardless of how or why. It is the last part that needs improving... We now have the ability to detect the type of error that was encountered by the mirror. After an event, we can get the status of the mirror, which will look something like this: "0 41943040 mirror 2 254:3 254:4 40960/40960 1 AA 3 clustered_disk 254:2 A" The 'A's signify that the disks are "alive". Looking at 'linux-2.6/drivers/md/dm-raid1.c' we find the other possible states: * A => Alive - No failures * D => Dead - A write failure occurred leaving mirror out-of-sync * S => Sync - A sychronization failure occurred, mirror out-of-sync * R => Read - A read failure occurred, mirror data unaffected We can do so much more with this information than the immediate removal of an offending device. 'S' could cause us to simply suspend/resume the device to restart the resynchronization process - giving us another shot at it. 'R' could mean that we have a unrecoverable read error - a block relocation might be initiated via a write. In the case of a 'D', we could wait some user configured amount of time (or %'age out of sync) before removing the offending device, as it could be a transient failure. A good DSO would also allow us to do proactive scans of RAID devices - spotting problems before they bite us. (Like the existence of an unrecoverable read error rendering a RAID5 useless - even before a drive has failed.) There are lots of possibilities here.... If I were to guess at the phases of development I would say they are: 1) Simplest working solution - DONE 2) Improve parsing of mirror status output in the DSO - Location => LVM2/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c - Be able to determine failure types (need more states then just 'ME_FAILURE') - At the very least, we improve the log messages at this phase and it sets us up to improve the handling of each error type - potentially ignoring some error types for now (like read failures). 3) Implement different methods to handle the different error types 4) Transient fault handling - Since we can't just assume "wait 5 seconds and then see if the failure still exists", we are going to have to make this configurable. Discussion should proceed on this in parallel with #2 and #3, since this phase will take a long time for everyone to agree. We have to determine where the user specifies the configuration - lvm.conf? CLI? We also have to determine /what/ their configuration will be based on - time? percentage of mirror out-of-sync? 5) Proactive scan - Even in the case where software does everything right, your RAID volume can become inconsistent. Long seeks, adjacent track erasure, problems with RAM or copying, unrecoverable read errors... it'd be nice if we could spot these before they become a problem. We could use the DSO to perform proactive scanning - perhaps just a little bit of the storage at a time. Every so many days, you could be reassured that everything is in order. brassow -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel