The test case runs fio while disabling and enabling PCI device of the test target block device. This often leaves the devices in offline or dead status. For example, when the block device is a HDD connected to HBA, kernel makes the device into offline mode with this message: sd x:x:x:x Device offlined - not ready after error recovery This causes following test cases to fail. To avoid the failure, remove and rescan the devices to get them back to online or live status. This improvement is similar as the commit f8f33218eca7 ("block/011: recover test target NVME device capacity"). While at this change, improve code comments for the commit f8f33218eca7, and add missing local variable declarations. Of note is that the added rescan operation triggers a lockdep WARN if the system has devices which depend on P2SB [1]. [1] https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/ Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> --- tests/block/011 | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/tests/block/011 b/tests/block/011 index a4230f4..2a967d1 100755 --- a/tests/block/011 +++ b/tests/block/011 @@ -38,6 +38,8 @@ device_requires() { test_device() { echo "Running ${TEST_NAME}" + local pdev size rescan=false state i + pdev="$(_get_pci_dev_from_blkdev)" if _test_dev_is_rotational; then @@ -60,17 +62,36 @@ test_device() { echo "Test complete" - # This test triggers NVME controller resets. When any failure happens - # during the resets, the driver marks the NVME block devices with zero - # capacity. Then following tests fail with the zero capacity devices. To - # get back the correct capacity, remove and rescan the devices. + # This test triggers NVME controller resets. When failures happen during + # the resets, the driver marks the NVME block devices as zero capacity. + # Remove and rescan the devices to regain the correct capacity. if ((!$(<"$TEST_DEV_SYSFS/size"))); then - echo "$TEST_DEV has zero capacity" >> "$FULL" - if [[ -w $TEST_DEV_SYSFS/device/device/remove ]] && + echo "$TEST_DEV has zero capacity. Rescan it." >> "$FULL" + rescan=true + fi + + # This test case often makes NVME or HDDs connected to HBAs in offline + # or dead mode. Remove and rescan the devices to make them online again. + if [[ -r $TEST_DEV_SYSFS/device/state ]]; then + state=$(cat "$TEST_DEV_SYSFS/device/state") + if [[ $state == offline || $state == dead ]]; then + echo "$TEST_DEV is $state. Rescan it." >> "$FULL" + rescan=true + fi + fi + + if [[ $rescan == true ]]; then + if [[ -w /sys/bus/pci/devices/$pdev/remove ]] && [[ -w /sys/bus/pci/rescan ]]; then - echo "Rescan to tegain the correct capacity" >> "$FULL" - echo 1 > "$TEST_DEV_SYSFS/device/device/remove" + echo 1 > "/sys/bus/pci/devices/$pdev/remove" echo 1 > /sys/bus/pci/rescan + else + echo "Can not rescan PCI device for recovery" + return 1 fi + for ((i = 0; i < 10; i++)); do + [[ -w $TEST_DEV ]] && break + sleep 5 + done fi } -- 2.43.0