On 4/10/23 4:49 AM, Shin'ichiro Kawasaki wrote:
Hello Alan,
I noticed that recently nvme/039 fails on my system occasionally (around 40%).
The failure messages are as follows:
nvme/039 => nvme0n1 (test error logging) [failed]
runtime 0.176s ... 0.167s
--- tests/nvme/039.out 2023-04-06 10:11:07.925670528 +0900
+++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2023-04-10 20:15:07.679538017 +0900
@@ -1,5 +1,2 @@
Running nvme/039
- Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
- Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
- Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
Test complete
nvme/039 => nvme0n1 (test error logging) [failed]
runtime 0.167s ... 0.199s
--- tests/nvme/039.out 2023-04-06 10:11:07.925670528 +0900
+++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2023-04-10 20:15:09.114539650 +0900
@@ -1,5 +1,4 @@
Running nvme/039
- Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
Test complete
It looks that expected error messages were not reported.
I suspect that the time duration is too short between error injection enable
and I/O to trigger the error. With the one line change below to add wait after
the error injection enable, the failures disappear. Do you think such wait is
the valid fix?
tests/nvme/rc | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/nvme/rc b/tests/nvme/rc
index 210a82a..7043c23 100644
--- a/tests/nvme/rc
+++ b/tests/nvme/rc
@@ -652,6 +652,7 @@ _nvme_enable_err_inject()
echo "$4" > /sys/kernel/debug/"$1"/fault_inject/dont_retry
echo "$5" > /sys/kernel/debug/"$1"/fault_inject/status
echo "$6" > /sys/kernel/debug/"$1"/fault_inject/times
+ sleep 0.1
}
_nvme_disable_err_inject()
I've been able to reproduce it. The sleep .1 helps but doesn't
eliminate the issue. I did notice whenever there was a failure, there
was also a "blk_print_req_error: 2 callbacks suppressed" in the log
which would break the parsing the test needs to do.
Alan