Re: [RFC PATCH v3 3/3] tracing: Fix syscall tracepoint use-after-free

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sun, 27 Oct 2024 18:22:54 -0700

On Sat, Oct 26, 2024 at 8:48 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> The grace period used internally within tracepoint.c:release_probes()
> uses call_rcu() to batch waiting for quiescence of old probe arrays,
> rather than using the tracepoint_synchronize_unregister() which blocks
> while waiting for quiescence.
>
> With the introduction of faultable syscall tracepoints, this causes
> use-after-free issues reproduced with syzkaller.
>
> Fix this by using the appropriate call_rcu() or call_rcu_tasks_trace()
> before invoking the rcu_free_old_probes callback. This can be chosen
> using the tracepoint_is_syscall() API.
>
> A similar issue exists in bpf use of call_rcu(). Fixing this is left to
> a separate change.
>
> Reported-by: syzbot+b390c8062d8387b6272a@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults")
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: Michael Jeanson <mjeanson@xxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: Yonghong Song <yhs@xxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> Cc: bpf@xxxxxxxxxxxxxxx
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Cc: Jordan Rife <jrife@xxxxxxxxxx>
> ---
> Changes since v0:
> - Introduce tracepoint_call_rcu(),
> - Fix bpf_link_free() use of call_rcu as well.
>
> Changes since v1:
> - Use tracepoint_call_rcu() for bpf_prog_put as well.
>
> Changes since v2:
> - Do not cover bpf changes in the same commit, let bpf developers
>   implement it.
> ---
>  kernel/tracepoint.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 5658dc92f5b5..47569fb06596 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -106,13 +106,16 @@ static void rcu_free_old_probes(struct rcu_head *head)
>         kfree(container_of(head, struct tp_probes, rcu));
>  }
>
> -static inline void release_probes(struct tracepoint_func *old)
> +static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
>  {
>         if (old) {
>                 struct tp_probes *tp_probes = container_of(old,
>                         struct tp_probes, probes[0]);
>
> -               call_rcu(&tp_probes->rcu, rcu_free_old_probes);
> +               if (tracepoint_is_syscall(tp))
> +                       call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);

should this be call_rcu_tasks_trace() -> call_rcu() chain instead of
just call_rcu_tasks_trace()? While currently call_rcu_tasks_trace()
implies RCU GP (as evidenced by rcu_trace_implies_rcu_gp() being
hardcoded right now to returning true), this might not always be the
case in the future, so it's best to have a guarantee that regardless
of sleepable or not, we'll always have have RCU GP, and for sleepable
tracepoint *also* RCU Tasks Trace GP.

> +               else
> +                       call_rcu(&tp_probes->rcu, rcu_free_old_probes);
>         }
>  }
>
> @@ -334,7 +337,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
>                 break;
>         }
>
> -       release_probes(old);
> +       release_probes(tp, old);
>         return 0;
>  }
>
> @@ -405,7 +408,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>                 WARN_ON_ONCE(1);
>                 break;
>         }
> -       release_probes(old);
> +       release_probes(tp, old);
>         return 0;
>  }
>
> --
> 2.39.5
>