On 7/8/24 15:44, Adrian Moreno wrote: > There are a couple of places where the test script "sleep"s to wait for > some external condition to be met. > > This is error prone, specially in slow systems (identified in CI by > "KSFT_MACHINE_SLOW=yes"). > > To fix this, add a "ovs_wait" function that tries to execute a command > a few times until it succeeds. The timeout used is set to 5s for > "normal" systems and doubled if a slow CI machine is detected. > > This should make the following work: > > $ vng --build \ > --config tools/testing/selftests/net/config \ > --config kernel/configs/debug.config > > $ vng --run . --user root -- "make -C tools/testing/selftests/ \ > KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests" > > Signed-off-by: Adrian Moreno <amorenoz@xxxxxxxxxx> > --- > .../selftests/net/openvswitch/openvswitch.sh | 49 ++++++++++++++++--- > .../selftests/net/openvswitch/ovs-dpctl.py | 1 + > 2 files changed, 42 insertions(+), 8 deletions(-) > Hi, Adrian. See a small pile of nitpicks below. None of them are blocking from my perspective, except for a typo. Just listed them since there is a typo anyway. > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh > index bc71dbc18b21..83407b42073a 100755 > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh > @@ -11,6 +11,7 @@ ksft_skip=4 > PAUSE_ON_FAIL=no > VERBOSE=0 > TRACING=0 > +WAIT_TIMEOUT=5 > > tests=" > arp_ping eth-arp: Basic arp ping between two NS > @@ -29,6 +30,32 @@ info() { > [ $VERBOSE = 0 ] || echo $* > } > > +ovs_wait() { > + info "waiting $WAIT_TIMEOUT s for: $@" > + > + "$@" > + if [[ $? -eq 0 ]]; then Maybe just 'if "$@"; then' ? > + info "wait succeeded inmediately" * immediately > + return 0 > + fi > + > + # A quick re-check helps speed up small races in fast systems. > + # However, fractional sleeps might not necessarily work. > + local start=0 > + sleep 0.1 || { sleep 1; start=1; } > + > + for (( i=start; i<WAIT_TIMEOUT; i++ )); do for i in $(seq ${start} ${WAIT_TIMEOUT}); do Will need to initialize start to 1 and 2. It works, but seems like an unnecessary use of non-POSIX constructs. > + "$@" > + if [[ $? -eq 0 ]]; then if "$@"; then > + info "wait succeeded after $i seconds" > + return 0 > + fi > + sleep 1 > + done > + info "wait failed after $i seconds" > + return 1 > +} > + > ovs_base=`pwd` > sbxs= > sbx_add () { > @@ -278,20 +305,21 @@ test_psample() { > > # Record psample data. > ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py psample-events > + ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout > > # Send a single ping. > - sleep 1 > ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1 > - sleep 1 > > # We should have received one userspace action upcall and 2 psample packets. > - grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1 > + ovs_wait grep -q "userspace action command" $ovs_dir/s0.out > + [[ $? -eq 0 ]] || return 1 Why checking separately and not one the same line with || return 1 ? Also double brackets seem unnecessary. > > # client -> server samples should only contain the first 14 bytes of the packet. > - grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \ > - $ovs_dir/stdout >/dev/null 2>&1 || return 1 > - grep -E "rate:4294967295,group:2,cookie:eeff0c" \ > - $ovs_dir/stdout >/dev/null 2>&1 || return 1 > + ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" $ovs_dir/stdout > + [[ $? -eq 0 ]] || return 1 > + > + ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout > + [[ $? -eq 0 ]] || return 1 Same for above two. > > return 0 > } > @@ -711,7 +739,8 @@ test_upcall_interfaces() { > ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \ > 172.31.110.1/24 -u || return 1 > > - sleep 1 > + ovs_wait grep -q "listening on upcall packet handler" ${ovs_dir}/left0.out > + > info "sending arping" > ip netns exec upc arping -I l0 172.31.110.20 -c 1 \ > >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr > @@ -811,6 +840,10 @@ shift $(($OPTIND-1)) > IFS=" > " > > +if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then > + WAIT_TIMEOUT=10 > +fi Should this be done closer to the first initialization of WAIT_TIMEOUT ? > + > for arg do > # Check first that all requested tests are available before running any > command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; } > diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > index 1e15b0818074..8a0396bfaf99 100644 > --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > @@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket): > marshal_class = psample_msg > > def read_samples(self): > + print("listening for psample events", flush=True) > while True: > try: > for msg in self.get():