On Monday August 4, rjoseph@jaw.lanl.gov wrote: > Hello, > > So, my real question is: where in the blazes does the MD/RAID system actually, > really, seriously detect a failed disk?! And when it does this, what is the > path of function calls taken to say "hey, this disk is failed, don't use it!"? (talk 2.4 language here) When raid5 want to schedule I/O on a block, it (Among other things) sets b_end_io to either raid5_end_read_request or raid5_end_write_request depending on type of request, and then calls generic_make_request to submit the request. When the request completes, the b_end_io function will be called. The second argument to the function ("uptodate") is '1' if the request was successful, and '0' if it failed (no further details of failure mode, sorry). If uptodate== 0, md_error is called which marks the device as faulty and calls raid5_error which marks it faulty again.... Eventually handle_stripe() gets called on the stripe again. It notices that the data still needs to be read or written, but now that device is failed so it uses some other strategy to accomplish the desired gaol. Let me know if you need anything else clarified. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html