I was wondering if there were any recommendations/suggestions on handling drive failures during a fio run. I hit one yesterday with a 60 second mixed use test on an SSD. 51 seconds in, the drive basically stopped responding. (A separate program that periodically calls smartctl to get drive state also showed something was up, as data like temperature was missing.) At 107 seconds, a read completed, and fio exited. It made me wonder what would have happened if the test was not time limited - e.g., a full drive write. Would it have just hung, waiting forever? Or would the OS eventually get back to fio and tell it the submitted operations have failed and fio would exit? Any ideas on ways to test the behavior, or areas of the code to look at? I'm basically looking for input on how to make sure fio does not hang in such situations. And even better would be if I could get fio to return an error if it does happen - I could see the controls for reporting error being configurable - e.g., if an operation doesn't return for N seconds, stop the job and return an error. I'm happy to work on implementing stuff to help with this, and wanted to see where things currently are at and what others thought about the general issue. Thanks, Nick