I would like to introduce a SCSI fault injection framework using SystemTap. Currently, kernel has Fault-injection framework and Faulty mode for md, which can also be used for testing the error handling. But, they could only produce fixed type of errors stochastically. In order to simulate more realistic scsi disk faults, I have created a new flexible fault injection framework using SystemTap. The new fault injection framework has the following features: 1) The new framework is flexible, easy to change the condition without changing the kernel because actually they are SystemTap scripts. For example, device faults resulting in scsi command timeout, and media faults which could be corrected by writing data to the failed sector could be simulated using this framework. 2) The new framework generates "pseudo" faults in the SCSI mid-layer. Any upper layer app/driver using the SCSI mid-layer can apply this framework. 3) The new framework rewrite the status code and sense data for SCSI command and pass it to the upper layer. So the real error handling routine of the upper layer for I/O request can be tested. I have tested the software RAID (md/dm-mirror) using this framework and found some bugs. e.g. -The kernel thread for md RAID1 could cause a deadlock when the error handler for md RAID1 contends with the write access to the md RAID1 array. -dm-mirror's redundancy doesn't work. A read error from the disk consisting the array will be directory passed to the userspace, without reading from the other mirror. (It turns out that this issue is a known issue, but the patch is not merged. http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-raid1-handle-read-failures.patch) There are also some other bugs for error handling routine in the multiple fault situation. I will report the details about these bugs later. The new framework is tested on Fedora8(i386) running with kernel 2.6.23.12. So far, I'm cleaning up the tool set for release, and plan to post it in the near future. If you are interested, take a look at it. If you have any comments, please let me know. -- ------------------------------------------------------------------------ Kenichi TANAKA | Open Source Software Platform Development Division | Computers Software Operations Unit, NEC Corporation | k-tanaka@xxxxxxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html