On Fri, Sep 9, 2022 at 10:56 PM Vincent Fu <vincent.fu@xxxxxxxxxxx> wrote: > You could test your theory about max_retries by creating an NVMe fabrics > loopback device backed by null_blk with error injection. Then try to access one > of the bad blocks via the nvme device and see if the delay before fio sees > the error depends on io_timeout and max_retries in the way that you expect. I finally got a chance to try this. I had to learn enough about nvme fabrics and combine it with the null_blk bad blocks as before. I think I'm doing everything right, but I'm wondering if I missed something, because the behavior trying to write to the nvme fabric device which has a null_blk device "backing" it is the same as trying to write to the null_blk device directly - immediate error and termination of fio. I wonder if this should really surprise me though since the underlying device doesn't experience a timeout and its error is immediately propagated to the client over the nvme fabric (apologies if I'm using any terminology wrong). This was my basic setup: sudo modprobe null_blk nr_devices=0 sudo mkdir /sys/kernel/config/nullb/nullb0 echo 1 | sudo tee -a /sys/kernel/config/nullb/nullb0/memory_backed echo "+1-100" | sudo tee -a /sys/kernel/config/nullb/nullb0/badblocks echo 1 | sudo tee -a /sys/kernel/config/nullb/nullb0/power # First fio run directly on null device returns error immediately sudo fio --filename=/dev/nullb0 --name=job --ioengine=libaio --direct=1 --size=1M --rw=rw --rwmixwrite=100 --bs=128K sudo modprobe nvme_tcp sudo modprobe nvmet-tcp sudo mkdir /sys/kernel/config/nvmet/subsystems/nvmet-test cd /sys/kernel/config/nvmet/subsystems/nvmet-test echo 1 | sudo tee -a attr_allow_any_host sudo mkdir namespaces/1 cd namespaces/1 echo -n /dev/nullb0 | sudo tee -a device_path echo 1 | sudo tee -a enable mkdir /sys/kernel/config/nvmet/ports/1 sudo mkdir /sys/kernel/config/nvmet/ports/1 echo 127.0.0.1 | sudo tee -a /sys/kernel/config/nvmet/ports/1/addr_traddr echo tcp | sudo tee -a /sys/kernel/config/nvmet/ports/1/addr_trtype echo 4420 | sudo tee -a /sys/kernel/config/nvmet/ports/1/addr_trsvcid echo ipv4 | sudo tee -a /sys/kernel/config/nvmet/ports/1/addr_adrfam sudo ln -s /sys/kernel/config/nvmet/subsystems/nvmet-test /sys/kernel/config/nvmet/ports/1/subsystems/nvmet-test sudo dmesg | grep nvmet_tcp sudo modprobe nvme sudo nvme discover -t tcp -a 127.0.0.1 -s 4420 sudo nvme connect -t tcp -n nvmet-test -a 127.0.0.1 -s 4420 sudo nvme list cat /proc/partitions | grep nvme # This guy also returns error immediately sudo fio --filename=/dev/nvme0n1 --name=job --ioengine=libaio --direct=1 --size=1M --rw=rw --rwmixwrite=100 --bs=128K