Re: [PATCH blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 24, 2024 / 14:18, Nilay Shroff wrote:
> Avoid waiting indefinitely for nvme passthru namespace block device
> to appear. Wait for up to 5 seconds and during this time if namespace
> block device doesn't appear then bail out and FAIL the test.
> 
> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
> ---
> Hi,
> 
> You may find more details about this issue here[1]. 
> 
> I found that blktest nvme/033-037 hangs indefinitely when 
> kernel rejects the passthru target namespace due to the 
> duplicate IDs. This patch helps address this issue by  
> ensuring that we bail out and fail the test if for any 
> reason passthru target namspace is not created on the 
> host. The relevant kernel patchv2 to fix the issue with 
> duplicate IDs while using passthru loop target can be
> found here[2].
> 
> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@xxxxxxxxxxxxx/
> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@xxxxxxxxxxxxx/
> 
> Thanks!

Thanks for the patch :) It's bad that you stumbled into the infinite while loop
in _nvmet_passthru_target_connect(). I agree that it should be fixed.

Please find my comments in line.

> ---
>  tests/nvme/033 |  7 +++++--
>  tests/nvme/034 |  7 +++++--
>  tests/nvme/035 |  6 +++---
>  tests/nvme/036 | 14 ++++++++------
>  tests/nvme/037 |  6 +++++-
>  tests/nvme/rc  | 12 +++++++++++-
>  6 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/nvme/033 b/tests/nvme/033
> index 5e05175..171974e 100755
> --- a/tests/nvme/033
> +++ b/tests/nvme/033
> @@ -62,8 +62,11 @@ test_device() {
>  	_nvmet_passthru_target_setup
>  
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	compare_dev_info "${nsdev}"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"

I think some more specific failure message will be useful. How about
"Failed to connect" or so? Same comment for the other test cases 034-037.

> +	else
> +		compare_dev_info "${nsdev}"
> +	fi
>  
>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
> diff --git a/tests/nvme/034 b/tests/nvme/034
> index 154fc91..7625204 100755
> --- a/tests/nvme/034
> +++ b/tests/nvme/034
> @@ -32,8 +32,11 @@ test_device() {
>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	else
> +		_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
> +	fi
>  
>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
> diff --git a/tests/nvme/035 b/tests/nvme/035
> index ff217d6..6ad9c56 100755
> --- a/tests/nvme/035
> +++ b/tests/nvme/035
> @@ -30,13 +30,13 @@ test_device() {
>  
>  	_setup_nvmet
>  
> -	local ctrldev

Good catch :)

>  	local nsdev
>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>  		echo "FAIL: fio verify failed"
>  	fi
>  
> diff --git a/tests/nvme/036 b/tests/nvme/036
> index 442ffe7..a67ca12 100755
> --- a/tests/nvme/036
> +++ b/tests/nvme/036
> @@ -30,13 +30,15 @@ test_device() {

This could be a good chance to add "local nsdev".

>  
>  	_nvmet_passthru_target_setup
>  	nsdev=$(_nvmet_passthru_target_connect)
> -
> -	ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
> -
> -	if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
> -		echo "ERROR: reset failed"
> +	if [[ -z "$nsdev" ]]; then
> +		echo "FAIL"
> +	else
> +		ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
> +
> +		if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
> +			echo "ERROR: reset failed"
> +		fi
>  	fi
> -

Nit: unnecessary change here.

>  	_nvme_disconnect_subsys
>  	_nvmet_passthru_target_cleanup
>  
> diff --git a/tests/nvme/037 b/tests/nvme/037
> index f7ddc2d..f0c8a77 100755
> --- a/tests/nvme/037
> +++ b/tests/nvme/037
> @@ -27,7 +27,6 @@ test_device() {
>  
>  	local subsys="blktests-subsystem-"
>  	local iterations=10
> -	local ctrldev

Good catch. And we can add "local nsdev" here.

>  
>  	for ((i = 0; i < iterations; i++)); do
>  		_nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
> @@ -37,6 +36,11 @@ test_device() {
>  		_nvme_disconnect_subsys \
>  			--subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
>  		_nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
> +
> +		if [[ -z "$nsdev" ]]; then
> +			echo "FAIL"
> +			break
> +		fi
>  	done
>  
>  	echo "Test complete"
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index a877de3..3def0d0 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() {
>  
>  _nvmet_passthru_target_connect() {
>  	local subsysnqn="$def_subsysnqn"
> +	local timeout="5"
>  
>  	while [[ $# -gt 0 ]]; do
>  		case $1 in
> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() {
>  	# The following tests can race with the creation
>  	# of the device so ensure the block device exists
>  	# before continuing
> -	while [ ! -b "${nsdev}" ]; do sleep 1; done
> +	start_time=$(date +%s)
> +	while [ ! -b "${nsdev}" ]; do
> +		sleep .1

Is there any reason to have 0.1 second wait duration? When I changed this to
"sleep 1", runtime of the test cases did not change on my test system.

> +		end_time=$(date +%s)
> +		if ((end_time - start_time > timeout)); then
> +			echo ""

This echo doesn't look required.

> +			return 1
> +		fi

If 1 second wait is okay instead of 0.1 second wait, the if block above can be
a bit simpler, like,

              if ((++count >= timeout)); then
                      return 1
              fi

where, "local count=0" should be declared before.

> +	done
>  
>  	echo "${nsdev}"
> +	return 0

This "return 0" looks redundant. The previous echo succeeds always, then 0 is
returned anyway. Also, all of the callers check the failure of this function by
referring to the nsdev string echoed. They do not refer to the return code. So,
it is not so useful to explicitly show that 0 is returned on success.

>  }
>  
>  _nvmet_passthru_target_cleanup() {
> -- 
> 2.45.2
> 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux