On 28/04/2021 16.59, Florent Revest wrote: > On Tue, Apr 27, 2021 at 8:03 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: >> >> On Tue, Apr 27, 2021 at 2:51 AM Florent Revest <revest@xxxxxxxxxxxx> wrote: >>> >>> On Tue, Apr 27, 2021 at 8:35 AM Rasmus Villemoes >>> <linux@xxxxxxxxxxxxxxxxxx> wrote: >>>> u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 }; >>>> - enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS]; >>>> + u32 *bin_args; >>>> static char buf[BPF_TRACE_PRINTK_SIZE]; >>>> unsigned long flags; >>>> int ret; >>>> >>>> - ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod, >>>> - MAX_TRACE_PRINTK_VARARGS); >>>> + ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args, >>>> + MAX_TRACE_PRINTK_VARARGS); >>>> if (ret < 0) >>>> return ret; >>>> >>>> - ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod), >>>> - BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod)); >>>> - /* snprintf() will not append null for zero-length strings */ >>>> - if (ret == 0) >>>> - buf[0] = '\0'; >>>> + ret = bstr_printf(buf, sizeof(buf), fmt, bin_args); >>>> >>>> raw_spin_lock_irqsave(&trace_printk_lock, flags); >>>> trace_bpf_trace_printk(buf); >>>> raw_spin_unlock_irqrestore(&trace_printk_lock, flags); >>>> >>>> Why isn't the write to buf[] protected by that spinlock? Or put another >>>> way, what protects buf[] from concurrent writes? >>> >>> You're right, that is a bug, I missed that buf was static and thought >>> it was just on the stack. That snprintf call should be after the >>> raw_spin_lock_irqsave. I'll send a patch. Thank you Rasmus. (before my >>> snprintf series, there was a vsprintf after the raw_spin_lock_irqsave) > > Solved now > >> Can you please also clean up unnecessary ()s you added in at least a >> few places. Thanks. > > Alexei said he took care of this .:) > >>>> Probably the test cases are not run in parallel, but this is the kind of >>>> thing that would give those symptoms. >>> >>> I think it's a separate issue from what Andrii reported though because >>> the flaky test exercises the bpf_snprintf helper and this buf spinlock >>> bug you just found only affects the bpf_trace_printk helper. >>> >>> That being said, it does smell a little bit like a concurrency issue >>> too, indeed. The bpf_snprintf test program is a raw_tp/sys_enter so it >>> attaches to all syscall entries and most likely gets executed many >>> more times than necessary and probably on parallel CPUs. The "pad_out" >>> buffer they write to is unique and not locked so maybe the test's >>> userspace reads pad_out while another CPU is writing on it and if the >>> string output goes through a stage where it is " 4 0000" before >>> being " 4 000", we might read at the wrong time. That being said, I >>> would find it weird that this happens as much as 50% of the time and >>> always specifically on that test case. >>> >>> Andrii could you maybe try changing the prog type to >>> "tp/syscalls/sys_enter_nanosleep" on the machine where you can >>> reproduce this bug ? >> >> Yes, it helps. I can't repro it easily anymore. > > Good, so it does sound like a concurrency issue indeed > >> I think the right fix, though, should be to filter by tid, not change the tracepoint. > > Agreed, I'll send a patch for that today. :) > >> I think what's happening is we see the string right before bstr_printf >> does zero-termination with end[-1] = '\0'; So in some cases we see >> truncated string, in others we see untruncated one. > > Makes sense but I still wonder why it happens so often (50% of the > time is really a lot) and why it is consistently that one test case > that fails and not the "overflow" case for example. But I'm confident > that this is not a bug in the helper now and that the tid filter will > fix the test. > If the caller, or one of its sibling threads, inspects the buffer before (v)snprintf has returned it's very obviously a bug in the caller. As for why that particular case exposes the race: It seems to be the only one that "expects" an insanely long result, and it takes a very very long time (several cycles per byte I'd assume) for the vsnprintf code to very carefully go through the if (buf < end) *buf = /* '0' or ' ' or whatever it is it is emitting here */ buf++; 900k times. So there's simply a very large window where the buffer contents is " 4 0000" while number() is still 'emitting' 0s until control returns to vsnprintf() which does that final end[-1] = '\0'. Rasmus