> -----Original Message----- > From: Nick Neumann [mailto:nick@xxxxxxxxxxxxxxxx] > Sent: Wednesday, September 7, 2022 11:58 AM > To: fio@xxxxxxxxxxxxxxx > Subject: Best practices for handling drive failures during a run? > > I was wondering if there were any recommendations/suggestions on > handling drive failures during a fio run. I hit one yesterday with a > 60 second mixed use test on an SSD. 51 seconds in, the drive basically > stopped responding. (A separate program that periodically calls > smartctl to get drive state also showed something was up, as data like > temperature was missing.) > > At 107 seconds, a read completed, and fio exited. > > It made me wonder what would have happened if the test was not time > limited - e.g., a full drive write. Would it have just hung, waiting > forever? Or would the OS eventually get back to fio and tell it the > submitted operations have failed and fio would exit? > > Any ideas on ways to test the behavior, or areas of the code to look at? > The null_blk device supports error injection via the badblocks configfs variable. So you could use it for testing. There is a help guide for setting up null_blk devices via configfs at https://zonedstorage.io/docs/getting-started/nullblk Vincent