On Tue, 2025-03-04 at 08:36 -0800, Amery Hung wrote: > Traffic monitor thread may see dangling stdout as the main thread closes > and reassigns stdout without protection. This happens when the main thread > finishes one subtest and moves to another one in the same netns_new() > scope. > The issue can be reproduced by running test_progs repeatedly with traffic > monitor enabled: > > for ((i=1;i<=100;i++)); do > ./test_progs -a flow_dissector_skb* -m '*' > done > > Fix it by first consolidating stdout assignment into stdio_restore(). > stdout will be restored to env.stdout_saved when a test ends or running > in the crash handler and to test_state.stdout_saved otherwise. > Then, protect use/close/reassignment of stdout with a lock. The locking > in the main thread is always performed regradless of whether traffic > monitor is running or not for simplicity. It won't have any side-effect. > stdio_restore() is kept in the crash handler instead of making all print > functions in the crash handler use env.stdout_saved to make it less > error-prone. > > Signed-off-by: Amery Hung <ameryhung@xxxxxxxxx> > --- This patch fixes the error for me. Tested-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > tools/testing/selftests/bpf/test_progs.c | 59 ++++++++++++++++-------- > 1 file changed, 39 insertions(+), 20 deletions(-) > > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c > index ab0f2fed3c58..5b89f6ca5a0a 100644 > --- a/tools/testing/selftests/bpf/test_progs.c > +++ b/tools/testing/selftests/bpf/test_progs.c > @@ -88,7 +88,11 @@ static void stdio_hijack(char **log_buf, size_t *log_cnt) > #endif > } > > -static void stdio_restore_cleanup(void) > +static pthread_mutex_t stdout_lock = PTHREAD_MUTEX_INITIALIZER; > + > +static bool in_crash_handler(void); > + > +static void stdio_restore(void) > { > #ifdef __GLIBC__ > if (verbose() && env.worker_id == -1) { > @@ -98,34 +102,34 @@ static void stdio_restore_cleanup(void) > > fflush(stdout); > > - if (env.subtest_state) { > + pthread_mutex_lock(&stdout_lock); > + > + if (!env.subtest_state || in_crash_handler()) { > + if (stdout == env.stdout_saved) > + goto out; > + > + fclose(env.test_state->stdout_saved); > + env.test_state->stdout_saved = NULL; > + stdout = env.stdout_saved; > + stderr = env.stderr_saved; > + } else { > fclose(env.subtest_state->stdout_saved); > env.subtest_state->stdout_saved = NULL; > stdout = env.test_state->stdout_saved; > stderr = env.test_state->stdout_saved; > - } else { > - fclose(env.test_state->stdout_saved); > - env.test_state->stdout_saved = NULL; > } > +out: > + pthread_mutex_unlock(&stdout_lock); > #endif > } stdio_restore_cleanup() did not reset stderr/stdout when env.subtest_state was NULL, but this difference does not seem to matter, stdio_restore_cleanup() was called from: - test__start_subtest(), where stdio_hijack_init() would override stderr/stdout anyways. - run_one_test(), where it is followed by call to stdio_restore(). I think this change is Ok. [...] > @@ -1276,6 +1281,18 @@ void crash_handler(int signum) > backtrace_symbols_fd(bt, sz, STDERR_FILENO); > } > > +static bool in_crash_handler(void) > +{ > + struct sigaction sigact; > + > + /* sa_handler will be cleared if invoked since crash_handler is > + * registered with SA_RESETHAND > + */ > + sigaction(SIGSEGV, NULL, &sigact); > + > + return sigact.sa_handler != crash_handler; > +} > + The patch would be simpler w/o this function. I double checked functions called from crash_handler() and two 'fprintf(stderr, ...)' there are the only places where stderr/stdout is used instead of *_saved versions. It is already a prevalent pattern to do 'fprintf(env.stderr_saved, ...)' in this file. Or pass a flag as in v3? > void hexdump(const char *prefix, const void *buf, size_t len) > { > for (int i = 0; i < len; i++) { > @@ -1957,6 +1974,8 @@ int main(int argc, char **argv) [...]