I have mdadm configured to run a script when an event occurs. I start mdadm like this: mdadm --monitor --scan& That is from a script in /etc/init.d My /etc/mdadm.conf file has this: PROGRAM /root/bin/handle-mdadm-events Other lines not related. The script has these 2 lines: echo '$1'=$1 '$2'=$2 '$3'=$3 '$4'=$4 >> /root/bin/handle-mdadm-events.log (date;cat /proc/mdstat;mdadm --detail $2)|mail -s "md event: $1 $2 $3" bugzilla@xxxxxxxxxxxxxxxx I have a test array with 8 disks, /dev/ram[0-7] I ran this command: # mdadm /dev/md3 -f /dev/ram0 mdadm: set /dev/ram0 faulty in /dev/md3 I waited, I got 1 email: Fail /dev/md3 /dev/ram0 I ran these 2 commands: # mdadm /dev/md3 -a /dev/ram8 mdadm: hot added /dev/ram8 # mdadm /dev/md3 -a /dev/ram9 mdadm: hot added /dev/ram9 Now I have 2 spares. I waited, I got this email: SpareActive /dev/md3 /dev/ram8 The Fail event and 4 others were missed. Examples from a slower array: $1=Fail $2=/dev/md2 $3=/dev/sdq1 $4= $1=Rebuild20 $2=/dev/md2 $3= $4= $1=Rebuild40 $2=/dev/md2 $3= $4= $1=Rebuild60 $2=/dev/md2 $3= $4= $1=Rebuild80 $2=/dev/md2 $3= $4= $1=SpareActive $2=/dev/md2 $3=/dev/sdc1 $4= I ran this command: # mdadm /dev/md3 -f /dev/ram1 mdadm: set /dev/ram1 faulty in /dev/md3 No emails were generated. About 6 events were missed. The Fail and SpareActive events were missed, and the 4 Rebuild events. I think, since the state changed, then changed back, within 60 seconds, the events were missed. For me, I don't recall ever missing an event on a "real" array, but with the faster disks and very small /boot partitions I believe it could easily happen. My small partitions don't have spares. Also, adds and removes don't generate events. Also, if there is no spare, the console display an extra warning: "md3: no spare disk to reconstruct array! -- continuing in degraded mode" Maybe this event should also generate an email. If there is a spare, the console displays this message: "md3: resyncing spare disk [dev 01:0e] to replace failed disk" Maybe both of the above should generate emails. Otherwise you must wait until the Rebuild20 event to know that there is a spare. Or I wait forever if there is not a spare. Just noticed while playing! If I use MAILADDR and don't use PROGRAM, like this: MAILADDR bugzilla@xxxxxxxxxxxxxxxx # PROGRAM /root/bin/handle-mdadm-events I don't get Fail events, but I do get some events, like SpareActive. No! Another test I got the Fail event, but not the SpareActive. With the above I did wait 60 seconds or more! And when I start monitor mode using PROGRAM I get these: $1=SparesMissing $2=/dev/md2 $3= $4= $1=SparesMissing $2=/dev/md3 $3= $4= $1=SparesMissing $2=/dev/md1 $3= $4= $1=SparesMissing $2=/dev/md0 $3= $4= But when using MAILADDR I don't get them! And they are wrong! /dev/md2 does have a spare, and sometimes md3 has one. Also, if I use both, PROGRAM and MAILADDR I get some events from MAILADDR and some from PROGRAM, I don't always get all events from both. I have not tried this much, so no details. Maybe md could save events in a queue, and mdadm --monitor could access the queue. Maybe something like /proc/mdevents could be usefull. I am using kernel 2.4.28 and mdadm 1.8.0. Guy - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html