On Jan 11, 2024 / 00:00, Chaitanya Kulkarni wrote: > On 1/10/2024 4:23 AM, Shinichiro Kawasaki wrote: > > Thanks again for this test case. Please find my review comments. Tomorrow, I > > will try to run this test case. > > > > On Jan 09, 2024 / 19:57, Chaitanya Kulkarni wrote: > >> Trigger and test nvme-pci timeout with concurrent fio jobs. > > > > Is there any related kernel commit which motivated this test case? If so, > > can we add a kernel commit or e-mail discussion link as a reference? > > > > no, this part if never tested on the regular basis as it has some > complicated logic I believe it is much needed test .. Okay, it's good to expand the test coverage. ... > >> +} > >> + > >> +test() { > >> + local dev="/dev/nvme0n1" > >> + > >> + echo "Running ${TEST_NAME}" > >> + > >> + echo 1 > /sys/block/nvme0n1/io-timeout-fail > >> + > >> + echo 99 > /sys/kernel/debug/fail_io_timeout/probability > >> + echo 10 > /sys/kernel/debug/fail_io_timeout/interval > >> + echo -1 > /sys/kernel/debug/fail_io_timeout/times > >> + echo 0 > /sys/kernel/debug/fail_io_timeout/space > >> + echo 1 > /sys/kernel/debug/fail_io_timeout/verbose > > > > To avoid impact on following test cases, I suggest to restore the original > > values of the fail_io_timeout/* sysfs attributes above at the end of this > > test case. _nvme_err_inject_setup() and _nvme_err_inject_cleanup() in > > test/nvme/rc do similar thing. We can add new helper functions in same manner. > > I can add the code for config and restore, but at this point there are > no other testcases using this ?(correct me if I'm wrong), This new test case is the only one user of fail_io_timeout, I believe. > so instead of > bloating the code in nvme/rc let's keep it for this testcase only ? > and add common helpers code later once we have more users for this > case ? I think either way is good. > > > > >> + > >> + # shellcheck disable=SC2046 > >> + fio --bs=4k --rw=randread --norandommap --numjobs=$(nproc) \ > > > > Double quotes for "$(nproc)" will allow to remove the shellcheck exception, > > probably. > > not sure I understand this comment, can you please elaborate ? I suggest this: diff --git a/tests/nvme/050 b/tests/nvme/050 index ba54bcd..8e5acb2 100755 --- a/tests/nvme/050 +++ b/tests/nvme/050 @@ -28,8 +28,7 @@ test() { echo 0 > /sys/kernel/debug/fail_io_timeout/space echo 1 > /sys/kernel/debug/fail_io_timeout/verbose - # shellcheck disable=SC2046 - fio --bs=4k --rw=randread --norandommap --numjobs=$(nproc) \ + fio --bs=4k --rw=randread --norandommap --numjobs="$(nproc)" \ --name=reads --direct=1 --filename=${dev} --group_reporting \ --time_based --runtime=1m 2>&1 | grep -q "Input/output error" > > > > >> + --name=reads --direct=1 --filename=${dev} --group_reporting \ > >> + --time_based --runtime=1m 2>&1 | grep -q "Input/output error" > >> + > >> + # shellcheck disable=SC2181 > >> + if [ $? -eq 0 ]; then > >> + echo "Test complete" > >> + else > >> + echo "Test failed" > >> + fi > >> + modprobe -r nvme > > > > If nvme driver is built-in, this unload command does not work. > > Do we really need to unload nvme driver here? > > > > Yes, post timeout handler execution we need to make sure that device > can be removed at the time of module unload, perhaps there is a way to > avoid this when nvme is a built-in module so that test will not execute > above ? any suggestions ? I see... One idea I can think of is to remove the device using PCI device sysfs. How about the code below? block/011 does similar thing. To do that for TEST_DEV, we can use _get_pci_from_dev_sysfs() in place of _get_pci_from_dev_sysfs(). Does this meet your intent? diff --git a/tests/nvme/050 b/tests/nvme/050 index 8e5acb2..592f167 100755 --- a/tests/nvme/050 +++ b/tests/nvme/050 @@ -17,6 +17,9 @@ requires() { test() { local dev="/dev/nvme0n1" + local pdev + + pdev="$(_get_pci_from_dev_sysfs /dev/nvme0n1)" echo "Running ${TEST_NAME}" @@ -38,5 +41,11 @@ test() { else echo "Test failed" fi - modprobe -r nvme + + # Remove and rescan the NVME device to ensure that it has come back + echo 1 > "/sys/bus/pci/devices/$pdev/remove" + echo 1 > /sys/bus/pci/rescan + if [[ ! -b $TEST_DEV ]]; then + echo "Failed to regain $TEST_DEV" + fi } BTW, when I ran the test case, I observed the test case failed. I took a quick look, and saw the nvme driver reported I/O timeout in kernel messages. But fio did not report the expected I/O error. Not sure why. I will look closer when I check v2 patch.