Re: [PATCH 1/2] bpf: Disable preemption in bpf_perf_event_output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 17, 2023 at 4:17 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> The nesting protection in bpf_perf_event_output relies on disabled
> preemption, which is guaranteed for kprobes and tracepoints.

I don't understand why you came up with such a conclusion.
bpf_perf_event_output needs migration disabled and doesn't mind
being preempted.
That's what the nested counter is for.

Stack trace also doesn't look like it's related to that.
More like stack corruption in perf_output_sample.

Do you have
commit eb81a2ed4f52 ("perf/core: Fix perf_output_begin parameter is
incorrectly invoked in perf_event_bpf_output")
in your kernel?

> However bpf_perf_event_output can be also called from uprobes context
> through bpf_prog_run_array_sleepable function which disables migration,
> but keeps preemption enabled.
>
> This can cause task to be preempted by another one inside the nesting
> protection and lead eventually to two tasks using same perf_sample_data
> buffer and cause crashes like:
>
>   kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>   BUG: unable to handle page fault for address: ffffffff82be3eea
>   ...
>   Call Trace:
>    ? __die+0x1f/0x70
>    ? page_fault_oops+0x176/0x4d0
>    ? exc_page_fault+0x132/0x230
>    ? asm_exc_page_fault+0x22/0x30
>    ? perf_output_sample+0x12b/0x910
>    ? perf_event_output+0xd0/0x1d0
>    ? bpf_perf_event_output+0x162/0x1d0
>    ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
>    ? __uprobe_perf_func+0x12b/0x540
>    ? uprobe_dispatcher+0x2c4/0x430
>    ? uprobe_notify_resume+0x2da/0xce0
>    ? atomic_notifier_call_chain+0x7b/0x110
>    ? exit_to_user_mode_prepare+0x13e/0x290
>    ? irqentry_exit_to_user_mode+0x5/0x30
>    ? asm_exc_int3+0x35/0x40
>
> Fixing this by disabling preemption in bpf_perf_event_output.
>
> Fixes: 9594dc3c7e71 ("bpf: fix nested bpf tracepoints with per-cpu data")
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
>  kernel/trace/bpf_trace.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index c92eb8c6ff08..2a6ba05d8aee 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -661,8 +661,7 @@ static DEFINE_PER_CPU(int, bpf_trace_nest_level);
>  BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map,
>            u64, flags, void *, data, u64, size)
>  {
> -       struct bpf_trace_sample_data *sds = this_cpu_ptr(&bpf_trace_sds);
> -       int nest_level = this_cpu_inc_return(bpf_trace_nest_level);
> +       struct bpf_trace_sample_data *sds;
>         struct perf_raw_record raw = {
>                 .frag = {
>                         .size = size,
> @@ -670,7 +669,12 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map,
>                 },
>         };
>         struct perf_sample_data *sd;
> -       int err;
> +       int nest_level, err;
> +
> +       preempt_disable();
> +
> +       sds = this_cpu_ptr(&bpf_trace_sds);
> +       nest_level = this_cpu_inc_return(bpf_trace_nest_level);
>
>         if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(sds->sds))) {
>                 err = -EBUSY;
> @@ -691,6 +695,7 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map,
>
>  out:
>         this_cpu_dec(bpf_trace_nest_level);
> +       preempt_enable();
>         return err;
>  }
>
> --
> 2.41.0
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux