Junaid Rizvi wrote: > What about mdctl --monitor ? I hadn't heard of this. A web search finds some references to it. It is not mentioned in the two pieces of doco I am relying on: http://www.linuxdoc.org/FAQ/Linux-RAID-FAQ/ Linux-RAID FAQ Gregory Leblanc gleblanc (at) cu-portland.edu Revision v0.0.10 24 April 2001 Revised by: gml http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html The Software-RAID HOWTO Jakob Østergaard ( jakob@ostenfeld.dk) v. 0.90.7 19th of January 2000 I don't have this program on my computer. I am approaching this as a user, installing Red Hat 7.2, rather than someone who is involved in programming the RAID code or interested in its internals. After searching this mailing list, I found that mdctl lives about 1000 km from here: http://www.cse.unsw.edu.au/~neilb/source/mdctl/ and has been discussed on this list since June last year. I read some messages on this list but do not follow all threads. mdctl is a single program that can be used to control Linux md devices. It is intended to provide all the functionality of the mdtools and raidtools but with a very different interface. mdctl can perform all functions without a configuration file. There is the option of using a configuration file, but not in the same way that raidtools uses one. raidtools uses a configuration file to describe how to create a RAID array, and also uses this file partially to start a previously created RAID array. Further, raidtools requires the configuration file for such things as stopping a raid array, which needs to know nothing about the array. After downloading the source and looking at the man page, I could find no such option "--monitor". Looking further, I find that in ReadMe.c, I find something on a "--monitor" option, which I think is a synonym for "Follow": For follow/monitor: --mail= -m : Address to mail alerts of failure to --program= -p : Program to run when an event is detected --alert= : same as --program --delay= -d : seconds of delay between polling state. default=60 Yes - it looks like mdctl can keep an eye on the RAID system and mail reports and run programs when something goes wrong. I will investigate further. I would like a way of ensuring the report system really works, without actually having a RAID failure. I suppose I could doctor the source to achieve this. Looking at Monitor.c, it does not yet add anything to the system logs if there is a failure. Every few seconds, scan every md device looking for changes When a change is found, log it, possibly run the alert command, and possibly send Email For each array, we record: Update time active/working/failed/spare drives State of each device. If the update time changes, check out all the data again It is possible that we cannot get the state of each device due to bugs in the md kernel module. if active_drives decreases, generate a "Fail" event if active_drives increases, generate a "SpareActive" event if we detect an array with active<raid and spare==0 we look at other arrays that have same spare-group If we find one with active==raid and spare>0, and if we can get_disk_info and find a name Then we hot-remove and hot-add to the other array This last paragraph seems to replicate what I thought was the automatic function of the existing RAID software - to add in a spare if necessary. I would want to be sure that there was no conflict. This RAID stuff is critical and hard to realistically test. When I first got RAID1 going on a RH6.0 installation, I tested it by unplugging one of the drives whilst compiling the kernal. There was a flurry of error messages for quite a while, but the system kept running perfectly, which greatly impressed me. Rebooting with the drive plugged in caused it to be automatically resynched - which was also impressive. That has been the extent of my testing. - Robin - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html