[PATCH blktests 1/2] block/011: recover test target devices to online or live status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The test case runs fio while disabling and enabling PCI device of the
test target block device. This often leaves the devices in offline or
dead status. For example, when the block device is a HDD connected to
HBA, kernel makes the device into offline mode with this message:

    sd x:x:x:x Device offlined - not ready after error recovery

This causes following test cases to fail. To avoid the failure, remove
and rescan the devices to get them back to online or live status. This
improvement is similar as the commit f8f33218eca7 ("block/011: recover
test target NVME device capacity"). While at this change, improve code
comments for the commit f8f33218eca7, and add missing local variable
declarations.

Of note is that the added rescan operation triggers a lockdep WARN if
the system has devices which depend on P2SB [1].

[1] https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
---
 tests/block/011 | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/tests/block/011 b/tests/block/011
index a4230f4..2a967d1 100755
--- a/tests/block/011
+++ b/tests/block/011
@@ -38,6 +38,8 @@ device_requires() {
 test_device() {
 	echo "Running ${TEST_NAME}"
 
+	local pdev size rescan=false state i
+
 	pdev="$(_get_pci_dev_from_blkdev)"
 
 	if _test_dev_is_rotational; then
@@ -60,17 +62,36 @@ test_device() {
 
 	echo "Test complete"
 
-	# This test triggers NVME controller resets. When any failure happens
-	# during the resets, the driver marks the NVME block devices with zero
-	# capacity. Then following tests fail with the zero capacity devices. To
-	# get back the correct capacity, remove and rescan the devices.
+	# This test triggers NVME controller resets. When failures happen during
+	# the resets, the driver marks the NVME block devices as zero capacity.
+	# Remove and rescan the devices to regain the correct capacity.
 	if ((!$(<"$TEST_DEV_SYSFS/size"))); then
-		echo "$TEST_DEV has zero capacity" >> "$FULL"
-		if [[ -w $TEST_DEV_SYSFS/device/device/remove ]] &&
+		echo "$TEST_DEV has zero capacity. Rescan it." >> "$FULL"
+		rescan=true
+	fi
+
+	# This test case often makes NVME or HDDs connected to HBAs in offline
+	# or dead mode. Remove and rescan the devices to make them online again.
+	if [[ -r $TEST_DEV_SYSFS/device/state ]]; then
+		state=$(cat "$TEST_DEV_SYSFS/device/state")
+		if [[ $state == offline || $state == dead ]]; then
+			echo "$TEST_DEV is $state. Rescan it." >> "$FULL"
+			rescan=true
+		fi
+	fi
+
+	if [[ $rescan == true ]]; then
+		if [[ -w /sys/bus/pci/devices/$pdev/remove ]] &&
 			   [[ -w /sys/bus/pci/rescan ]]; then
-			echo "Rescan to tegain the correct capacity" >> "$FULL"
-			echo 1 > "$TEST_DEV_SYSFS/device/device/remove"
+			echo 1 > "/sys/bus/pci/devices/$pdev/remove"
 			echo 1 > /sys/bus/pci/rescan
+		else
+			echo "Can not rescan PCI device for recovery"
+			return 1
 		fi
+		for ((i = 0; i < 10; i++)); do
+			[[ -w $TEST_DEV ]] && break
+			sleep 5
+		done
 	fi
 }
-- 
2.43.0





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux