On Thu, Feb 13, 2025 at 3:55 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 2/13/25 3:32 PM, Amery Hung wrote: > > Fix a race condition between the main test_progs thread and the traffic > > monitoring thread. The traffic monitor thread tries to print a line > > using multiple printf and use flockfile() to prevent the line from being > > torn apart. Meanwhile, the main thread doing io redirection can reassign > > or close stdout when going through tests. A deadlock as shown below can > > happen. > > > > main traffic_monitor_thread > > ==== ====================== > > show_transport() > > -> flockfile(stdout) > > > > stdio_hijack_init() > > -> stdout = open_memstream(log_buf, log_cnt); > > ... > > env.subtest_state->stdout_saved = stdout; > > > > ... > > funlockfile(stdout) > > stdio_restore_cleanup() > > -> fclose(env.subtest_state->stdout_saved); > > Great debugging. > > Does it mean that the main thread will start the next test before the > traffic_monitor_thread has finished? Meaning the traffic_monitor_stop() does not > wait for the traffic_monitor_thread somehow? That part I think is fine. The race condition here happen between subtests within the same "netns_new" scope. For example, test_flow_dissector_skb_less_direct_attach. > > > > > After the traffic monitor thread lock stdout, A new memstream can be > > assigned to stdout by the main thread. Therefore, the traffic monitor > > thread later will not be able to unlock the original stdout. As the > > main thread tries to access the old stdout, it will hang indefinitely > > as it is still locked by the traffic monitor thread. > > > > The deadlock can be reproduced by running test_progs repeatedly with > > traffic monitor enabled: > > > > for ((i=1;i<=100;i++)); do > > ./test_progs -a flow_dissector_skb* -m '*' > > done > > > > Fix this by only calling printf once and remove flockfile()/funlockfile(). > > Yep. I agree this patch should be the better way to print the one-liner regardless. >