On Wed, Sep 15, 2021 at 8:26 PM Yucong Sun <fallentree@xxxxxx> wrote: > > From: Yucong Sun <sunyucong@xxxxxxxxx> > > This patch modify some tests to provide serial_test_name() instead of > test_name() to indicate it must run on worker 0. On encountering these tests, > all other threads will wait on a conditional variable, which worker 0 will > signal once the tests has finished running. > > Additionally, before running the test, thread 0 also check and wait until all > other threads has finished their current work, to make sure the pinned test > really are the only test running in the system. > > After this change, all tests should pass in '-j' mode. > > Signed-off-by: Yucong Sun <sunyucong@xxxxxxxxx> > --- I don't like this approach, it's over-complicated. I see two ways to do this more simple: 1) Let main process run all the serial tests before even instantiating workers, then run parallel tests with worker. The benefit of this is that we can structure code to have main testing loop that in sequential mode will run both parallel and serial tests, while in parallel mode will run only serial tests. 2) Teach worker 0 to run all serial tests one by one on start, before any parallel tests are run. Then we need to teach other workers to wait for it or teach main process to wait for worker 0 to finish before dispatching work to worker 1+. I think this is more convoluted and complicated. I certainly prefer #1. Additional benefit is that the worker and server's code would need to work consistently with all the data structured (preserving error logs until the end), etc. So it's a good test and forcing function to unify parallel and sequential modes. WDYT? > .../selftests/bpf/prog_tests/bpf_obj_id.c | 2 +- > .../bpf/prog_tests/select_reuseport.c | 2 +- > .../testing/selftests/bpf/prog_tests/timer.c | 2 +- > .../selftests/bpf/prog_tests/xdp_bonding.c | 2 +- > .../selftests/bpf/prog_tests/xdp_link.c | 2 +- > tools/testing/selftests/bpf/test_progs.c | 112 ++++++++++++++---- > 6 files changed, 95 insertions(+), 27 deletions(-) > [...] > @@ -954,15 +969,42 @@ void *dispatch_thread(void *ctx) > > test = &prog_test_defs[current_test_idx]; > test_to_run = current_test_idx; > - current_test_idx++; > > - pthread_mutex_unlock(¤t_test_lock); > - } > + test = &prog_test_defs[test_to_run]; that's the same test as current_test_idx above?... > > - if (!test->should_run) { > - continue; > - } > + if (!test->should_run) { > + current_test_idx++; > + pthread_mutex_unlock(¤t_test_lock); > + goto next; > + } > + > + if (is_serial_test(current_test_idx)) { > + if (data->worker_id != 0) { > + if (env.debug) > + fprintf(stderr, "[%d]: Waiting for thread 0 to finish serialized test: %d.\n", > + data->worker_id, current_test_idx + 1); > + /* wait for worker 0 to pick this job up and finish */ > + pthread_cond_wait(&wait_for_worker0, ¤t_test_lock); > + pthread_mutex_unlock(¤t_test_lock); > + goto next; > + } else { > + /* wait until all other worker has parked */ > + for (int i = 1; i < env.workers; i++) { > + if (env.worker_current_test[i] != -1) { > + if (env.debug) > + fprintf(stderr, "[%d]: Waiting for other threads to finish current test...\n", data->worker_id); > + pthread_mutex_unlock(¤t_test_lock); > + usleep(1 * 1000 * 1000); hm... I wonder if this contributes to those 20 seconds run time even for very fast tests... > + goto next; > + } > + } > + } > + } else { > + current_test_idx++; > + } [...] > + while (!all_finished) { > + all_finished = true; > + for (int i = 0; i < env.workers; i++) { > + if (!dispatcher_threads[i]) > + continue; > + > + if (pthread_tryjoin_np(dispatcher_threads[i], NULL) == EBUSY) { > + all_finished = false; > + if (!env.debug) continue; > + if (env.worker_current_test[i] == -1) > + fprintf(stderr, "Still waiting for thread %d (blocked by thread 0).\n", i); > + else > + fprintf(stderr, "Still waiting for thread %d (test #%d:%s).\n", > + i, env.worker_current_test[i] + 1, > + get_test_name(env.worker_current_test[i])); > + } else { > + dispatcher_threads[i] = 0; > + } > } > + usleep(10 * 1000 * 1000); and here you have 10 seconds just waiting doing nothing... > } > + > free(dispatcher_threads); > free(env.worker_current_test); > free(data); > @@ -1326,6 +1388,12 @@ int main(int argc, char **argv) > test->should_run = true; > else > test->should_run = false; > + > + if (test->run_test == NULL && test->run_serial_test == NULL) { > + fprintf(stderr, "Test %d:%s must have either test_%s() or serial_test_%sl() defined.\n", but not both, so let's check !!test->run_test == !!test->run_serial_test to make sure that only one is specified > + test->test_num, test->test_name, test->test_name, test->test_name); > + exit(EXIT_ERR_SETUP_INFRA); > + } > } > > /* ignore workers if we are just listing */ > -- > 2.30.2 >