Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> · Sat, 6 Apr 2024 12:41:05 +0900

On Tue, 2 Apr 2024 22:21:00 -0700
Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:

> On Tue, Apr 2, 2024 at 9:00 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 2, 2024 at 5:52 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > On Wed, 3 Apr 2024 09:40:48 +0900
> > > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
> > >
> > > > OK, for me, this last sentence is preferred for the help message. That explains
> > > > what this is for.
> > > >
> > > >         All callbacks that attach to the function tracing have some sort
> > > >         of protection against recursion. This option is only to verify that
> > > >　       ftrace (and other users of ftrace_test_recursion_trylock()) are not
> > > >         called outside of RCU, as if they are, it can cause a race.
> > > >         But it also has a noticeable overhead when enabled.
> >
> > Sounds good to me, I can add this to the description of the Kconfig option.
> >
> > > >
> > > > BTW, how much overhead does this introduce? and the race case a kernel crash?
> >
> > I just checked our fleet-wide production data for the last 24 hours.
> > Within the kprobe/kretprobe code path (ftrace_trampoline and
> > everything called from it), rcu_is_watching (both calls, see below)
> > cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd
> > prefer to be able to avoid that in production use cases.
> >
> 
> I just ran synthetic microbenchmark testing multi-kretprobe
> throughput. We get (in millions of BPF kretprobe-multi program
> invocations per second):
>   - 5.568M/s as baseline;
>   - 5.679M/s with changes in this patch (+2% throughput improvement);
>   - 5.808M/s with disabling rcu_is_watching in rethook_try_get()
> (+2.3% more vs just one of rcu_is_watching, and +4.3% vs baseline).
> 
> It's definitely noticeable.

Thanks for checking the overhead! Hmm, it is considerable.

> > > > or just messed up the ftrace buffer?
> > >
> > > There's a hypothetical race where it can cause a use after free.

Hmm, so it might not lead a kernel crash but is better to enable with
other debugging options.

> > >
> > > That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(),
> > > which requires RCU to be watching. There's a theoretical case where that
> > > task calls the trampoline and misses the synchronization. Note, these
> > > locations are with preemption disabled, as rcu is always watching when
> > > preemption is enabled. Thus it would be extremely fast where as the
> > > synchronize_rcu_tasks() is rather slow.
> > >
> > > We also have synchronize_rcu_tasks_rude() which would actually keep the
> > > trace from happening, as it would schedule on each CPU forcing all CPUs to
> > > have RCU watching.
> > >
> > > I have never heard of this race being hit. I guess it could happen on a VM
> > > where a vCPU gets preempted at the right moment for a long time and the
> > > other CPUs synchronize.
> > >
> > > But like lockdep, where deadlocks can crash the kernel, we don't enable it
> > > for production.
> > >
> > > The overhead is another function call within the function tracer. I had
> > > numbers before, but I guess I could run tests again and get new numbers.
> > >
> >
> > I just noticed another rcu_is_watching() call, in rethook_try_get(),
> > which seems to be a similar and complementary validation check to the
> > one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option
> > in this patch. It feels like both of them should be controlled by the
> > same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> > guard around rcu_is_watching() check in rethook_try_get() as well?

Hmmm, no, I think it should not change the rethook side because rethook
can be used with kprobes without ftrace. If we can detect it is used from
the ftrace, we can skip it. (From this reason, I would like to remove
return probe from kprobes...)

Thank you,

> >
> >
> > > Thanks,
> > >
> > > -- Steve

-- 
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>