If you try to run tests such as generic/108 in a loop you'll eventually see a failure, but the failure can be a false positive and the test was just unable to remove the scsi_debug module. We need to give some time for the refcnt to become 0. For instance for the test generic/108 the refcnt lingers between 2 and 1. It should be 0 when we're done but a bit of time seems to be required. The chance of us trying to run rmmod when the refcnt is 2 or 1 is low, about 1/30 times if you run the test in a loop on linux-next today. Likewise, even when its 0 we just need a tiny breather before we can remove the module (sleep 10 suffices) but this is only required on older kernels. Otherwise removing the module will just fail. Some of these races are documented on the korg#212337, and Doug Gilbert has posted at least one patch attempt to try to help with this [1]. The patch does not resolve all the issues though, it helps though. This let's us remove the cheesy try loop. We keep the udevadm settle call as it can help salvage buggy tests which forgot to call it. We also special-case where MODPROBE_PATIENT_RM_TIMEOUT_SECONDS is set to "forever" and the initial module check finds its in use, for that case we just try removing the module once since fstests would not be the one leaving modues lingering around, and waiting forever could mean you won't discover the issue for a while. [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337 [1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@xxxxxxxxxxxx Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> --- common/scsi_debug | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/common/scsi_debug b/common/scsi_debug index e7988469..1e0ca255 100644 --- a/common/scsi_debug +++ b/common/scsi_debug @@ -4,11 +4,32 @@ # # Functions useful for tests on unique block devices +. common/module + _require_scsi_debug() { - # make sure we have the module and it's not already used + local mod_present=0 + + # make sure we have the module modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found" - lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use") + + lsmod | grep -wq scsi_debug + if [[ $? -eq 0 ]]; then + mod_present=1 + fi + + if [[ $mod_present -eq 1 ]]; then + # We try to remove the module only once if MODPROBE_PATIENT_RM_TIMEOUT_SECONDS + # is set to forever because fstests does not leave modules + # lingering around. If you do have a module lingering around + # and its being used, it wasn't us who started it, so you + # likely would not want to wait forever for it really. + if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" == "forever" ]]; then + rmmod scsi_debug || _notrun "scsi_debug module in use and MODPROBE_PATIENT_RM_TIMEOUT_SECONDS set to forever, removing once failed" + else + _patient_rmmod scsi_debug || _notrun "scsi_debug module in use" + fi + fi # make sure it has the features we need # logical/physical sectors plus unmap support all went in together modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old" @@ -44,14 +65,6 @@ _get_scsi_debug_dev() _put_scsi_debug_dev() { lsmod | grep -wq scsi_debug || return - - n=2 - # use redirection not -q option of modprobe here, because -q of old - # modprobe is only quiet when the module is not found, not when the - # module is in use. - while [ $n -ge 0 ] && ! modprobe -nr scsi_debug >/dev/null 2>&1; do - $UDEV_SETTLE_PROG - n=$((n-1)) - done - rmmod scsi_debug || _fail "Could not remove scsi_debug module" + $UDEV_SETTLE_PROG + _patient_rmmod scsi_debug || _fail "Could not remove scsi_debug module" } -- 2.30.2