On Tue, Mar 4, 2025 at 4:12 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Tue, 2025-03-04 at 08:36 -0800, Amery Hung wrote: > > Traffic monitor thread may see dangling stdout as the main thread closes > > and reassigns stdout without protection. This happens when the main thread > > finishes one subtest and moves to another one in the same netns_new() > > scope. > > The issue can be reproduced by running test_progs repeatedly with traffic > > monitor enabled: > > > > for ((i=1;i<=100;i++)); do > > ./test_progs -a flow_dissector_skb* -m '*' > > done > > > > Fix it by first consolidating stdout assignment into stdio_restore(). > > stdout will be restored to env.stdout_saved when a test ends or running > > in the crash handler and to test_state.stdout_saved otherwise. > > Then, protect use/close/reassignment of stdout with a lock. The locking > > in the main thread is always performed regradless of whether traffic > > monitor is running or not for simplicity. It won't have any side-effect. > > stdio_restore() is kept in the crash handler instead of making all print > > functions in the crash handler use env.stdout_saved to make it less > > error-prone. > > > > Signed-off-by: Amery Hung <ameryhung@xxxxxxxxx> > > --- > > This patch fixes the error for me. > > Tested-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > > > tools/testing/selftests/bpf/test_progs.c | 59 ++++++++++++++++-------- > > 1 file changed, 39 insertions(+), 20 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c > > index ab0f2fed3c58..5b89f6ca5a0a 100644 > > --- a/tools/testing/selftests/bpf/test_progs.c > > +++ b/tools/testing/selftests/bpf/test_progs.c > > @@ -88,7 +88,11 @@ static void stdio_hijack(char **log_buf, size_t *log_cnt) > > #endif > > } > > > > -static void stdio_restore_cleanup(void) > > +static pthread_mutex_t stdout_lock = PTHREAD_MUTEX_INITIALIZER; > > + > > +static bool in_crash_handler(void); > > + > > +static void stdio_restore(void) > > { > > #ifdef __GLIBC__ > > if (verbose() && env.worker_id == -1) { > > @@ -98,34 +102,34 @@ static void stdio_restore_cleanup(void) > > > > fflush(stdout); > > > > - if (env.subtest_state) { > > + pthread_mutex_lock(&stdout_lock); > > + > > + if (!env.subtest_state || in_crash_handler()) { > > + if (stdout == env.stdout_saved) > > + goto out; > > + > > + fclose(env.test_state->stdout_saved); > > + env.test_state->stdout_saved = NULL; > > + stdout = env.stdout_saved; > > + stderr = env.stderr_saved; > > + } else { > > fclose(env.subtest_state->stdout_saved); > > env.subtest_state->stdout_saved = NULL; > > stdout = env.test_state->stdout_saved; > > stderr = env.test_state->stdout_saved; > > - } else { > > - fclose(env.test_state->stdout_saved); > > - env.test_state->stdout_saved = NULL; > > } > > +out: > > + pthread_mutex_unlock(&stdout_lock); > > #endif > > } > > stdio_restore_cleanup() did not reset stderr/stdout when > env.subtest_state was NULL, but this difference does not seem to > matter, stdio_restore_cleanup() was called from: > - test__start_subtest(), where stdio_hijack_init() would override > stderr/stdout anyways. > - run_one_test(), where it is followed by call to stdio_restore(). > > I think this change is Ok. > > [...] > > > @@ -1276,6 +1281,18 @@ void crash_handler(int signum) > > backtrace_symbols_fd(bt, sz, STDERR_FILENO); > > } > > > > +static bool in_crash_handler(void) > > +{ > > + struct sigaction sigact; > > + > > + /* sa_handler will be cleared if invoked since crash_handler is > > + * registered with SA_RESETHAND > > + */ > > + sigaction(SIGSEGV, NULL, &sigact); > > + > > + return sigact.sa_handler != crash_handler; > > +} > > + > > The patch would be simpler w/o this function. I double checked > functions called from crash_handler() and two 'fprintf(stderr, ...)' > there are the only places where stderr/stdout is used instead of > *_saved versions. It is already a prevalent pattern to do > 'fprintf(env.stderr_saved, ...)' in this file. > Or pass a flag as in v3? While that is a common pattern, I think restoring stdio there can make sure people don't accidentally use stdout in the crash handler and find that it does not work occasionally. I will remove the in_crash_handler special case handling per Martin's suggestion. > > > void hexdump(const char *prefix, const void *buf, size_t len) > > { > > for (int i = 0; i < len; i++) { > > @@ -1957,6 +1974,8 @@ int main(int argc, char **argv) > > [...] >