On 9/24/24 17:44, Shinichiro Kawasaki wrote: > On Sep 24, 2024 / 14:18, Nilay Shroff wrote: >> Avoid waiting indefinitely for nvme passthru namespace block device >> to appear. Wait for up to 5 seconds and during this time if namespace >> block device doesn't appear then bail out and FAIL the test. >> >> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx> >> --- >> Hi, >> >> You may find more details about this issue here[1]. >> >> I found that blktest nvme/033-037 hangs indefinitely when >> kernel rejects the passthru target namespace due to the >> duplicate IDs. This patch helps address this issue by >> ensuring that we bail out and fail the test if for any >> reason passthru target namspace is not created on the >> host. The relevant kernel patchv2 to fix the issue with >> duplicate IDs while using passthru loop target can be >> found here[2]. >> >> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@xxxxxxxxxxxxx/ >> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@xxxxxxxxxxxxx/ >> >> Thanks! > > Thanks for the patch :) It's bad that you stumbled into the infinite while loop > in _nvmet_passthru_target_connect(). I agree that it should be fixed. > > Please find my comments in line. > Thank you for your review comments!! >> --- >> tests/nvme/033 | 7 +++++-- >> tests/nvme/034 | 7 +++++-- >> tests/nvme/035 | 6 +++--- >> tests/nvme/036 | 14 ++++++++------ >> tests/nvme/037 | 6 +++++- >> tests/nvme/rc | 12 +++++++++++- >> 6 files changed, 37 insertions(+), 15 deletions(-) >> >> diff --git a/tests/nvme/033 b/tests/nvme/033 >> index 5e05175..171974e 100755 >> --- a/tests/nvme/033 >> +++ b/tests/nvme/033 >> @@ -62,8 +62,11 @@ test_device() { >> _nvmet_passthru_target_setup >> >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - compare_dev_info "${nsdev}" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" > > I think some more specific failure message will be useful. How about > "Failed to connect" or so? Same comment for the other test cases 034-037. Agreed, I will add a meaningful error message in next patch. > >> + else >> + compare_dev_info "${nsdev}" >> + fi >> >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> diff --git a/tests/nvme/034 b/tests/nvme/034 >> index 154fc91..7625204 100755 >> --- a/tests/nvme/034 >> +++ b/tests/nvme/034 >> @@ -32,8 +32,11 @@ test_device() { >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + else >> + _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}" >> + fi >> >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> diff --git a/tests/nvme/035 b/tests/nvme/035 >> index ff217d6..6ad9c56 100755 >> --- a/tests/nvme/035 >> +++ b/tests/nvme/035 >> @@ -30,13 +30,13 @@ test_device() { >> >> _setup_nvmet >> >> - local ctrldev > > Good catch :) > >> local nsdev >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then >> echo "FAIL: fio verify failed" >> fi >> >> diff --git a/tests/nvme/036 b/tests/nvme/036 >> index 442ffe7..a67ca12 100755 >> --- a/tests/nvme/036 >> +++ b/tests/nvme/036 >> @@ -30,13 +30,15 @@ test_device() { > > This could be a good chance to add "local nsdev". Yeah will do. > >> >> _nvmet_passthru_target_setup >> nsdev=$(_nvmet_passthru_target_connect) >> - >> - ctrldev=$(_find_nvme_dev "${def_subsysnqn}") >> - >> - if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then >> - echo "ERROR: reset failed" >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + else >> + ctrldev=$(_find_nvme_dev "${def_subsysnqn}") >> + >> + if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then >> + echo "ERROR: reset failed" >> + fi >> fi >> - > > Nit: unnecessary change here. Okay, will remove this. > >> _nvme_disconnect_subsys >> _nvmet_passthru_target_cleanup >> >> diff --git a/tests/nvme/037 b/tests/nvme/037 >> index f7ddc2d..f0c8a77 100755 >> --- a/tests/nvme/037 >> +++ b/tests/nvme/037 >> @@ -27,7 +27,6 @@ test_device() { >> >> local subsys="blktests-subsystem-" >> local iterations=10 >> - local ctrldev > > Good catch. And we can add "local nsdev" here. yes will do. > >> >> for ((i = 0; i < iterations; i++)); do >> _nvmet_passthru_target_setup --subsysnqn "${subsys}${i}" >> @@ -37,6 +36,11 @@ test_device() { >> _nvme_disconnect_subsys \ >> --subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1 >> _nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}" >> + >> + if [[ -z "$nsdev" ]]; then >> + echo "FAIL" >> + break >> + fi >> done >> >> echo "Test complete" >> diff --git a/tests/nvme/rc b/tests/nvme/rc >> index a877de3..3def0d0 100644 >> --- a/tests/nvme/rc >> +++ b/tests/nvme/rc >> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() { >> >> _nvmet_passthru_target_connect() { >> local subsysnqn="$def_subsysnqn" >> + local timeout="5" >> >> while [[ $# -gt 0 ]]; do >> case $1 in >> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() { >> # The following tests can race with the creation >> # of the device so ensure the block device exists >> # before continuing >> - while [ ! -b "${nsdev}" ]; do sleep 1; done >> + start_time=$(date +%s) >> + while [ ! -b "${nsdev}" ]; do >> + sleep .1 > > Is there any reason to have 0.1 second wait duration? When I changed this to > "sleep 1", runtime of the test cases did not change on my test system. > yes you're right there won't be any noticeable change in the runtime of the test unless we run this test in a loop for multiple number of times. I would say that the gain would be only a fraction of a second in this case. So I will use sleep 1 instead of .1 as you suggested. >> + end_time=$(date +%s) >> + if ((end_time - start_time > timeout)); then >> + echo "" > > This echo doesn't look required. > >> + return 1 >> + fi > > If 1 second wait is okay instead of 0.1 second wait, the if block above can be > a bit simpler, like, > > if ((++count >= timeout)); then > return 1 > fi > > where, "local count=0" should be declared before. > yeah with sleep 1 we can make it simple. I will update this in next patch. >> + done >> >> echo "${nsdev}" >> + return 0 > > This "return 0" looks redundant. The previous echo succeeds always, then 0 is > returned anyway. Also, all of the callers check the failure of this function by > referring to the nsdev string echoed. They do not refer to the return code. So, > it is not so useful to explicitly show that 0 is returned on success. > yes I will incorporate it in the next patch. Thanks, --Nilay