Re: Best practices for handling drive failures during a run?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 22, 2022 at 7:11 PM Nick Neumann <nick@xxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Sep 10, 2022 at 3:28 AM Damien Le Moal
> <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote:
> > You can use write-long, to "destroy" sectors: you will get errors when
> > attempting to read the affected sectors. But that is a really big hammer. A
> > simpler solution is to use dm-flakey to create "soft" IO errors.
>
> Thank you for mentioning this - I'm not a linux veteran so I did not
> know about these tools.
>
> I tried dm-flakey, but when the device is down, the errors are
> returned immediately. I also looked at dm-delay, and that actually
> worked pretty well for getting fio to sit and wait on an I/O.
>
> Unfortunately I have a hard time getting the delay to be "big". The
> time it takes to add the delay rule appears to be a linear function of
> the amount of delay, with a very big constant factor. A half second
> delay takes 11 seconds to add, and a 5 second delay takes 112 seconds:
> sudo time dmsetup create test9 --table "0 1024 delay /dev/nullb1 0 500
> /dev/nullb1 0 0"
> 0.00user
> 0.00system
> 0:11.28elapsed
> ...
> sudo time dmsetup create test10 --table "0 1024 delay /dev/nullb1 0
> 5000 /dev/nullb1 0 0"
> 0.00user
> 0.00system
> 1:52.70elapsed
>
> And unfortunately something breaks at some point, as my attempt to do
> a 70 second delay had not finished after 2 hours. I'm experimenting
> right now to try to find a smaller but still big value that is useful
> for testing the nvme timeout/retry defaults. I've seen code snippets
> online though that set the delay to 100 seconds, so I'm at a loss why
> the time to do it is growing so large on my system.
>

Hi Nick,

If you're trying to create an error at an arbitrary location, at an
arbitrary time, you might be interested in using the dm-dust target.
The documentation in the admin-guide for dm-dust [1] has information
on the command interface that the target uses (via the "dmsetup
message" command) in order to set up a specific failure scenario for a
test device.


Thanks,

Bryan

[1] https://www.kernel.org/doc/html/v5.19/admin-guide/device-mapper/dm-dust.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux