Re: [PATCH/RFC 2/2] bpf: Disable preemption in bpf_event_output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 17, 2023 at 4:18 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> We received report [1] of kernel crash, which is caused by
> using nesting protection without disabled preemption.

Same question. I don't see why it would be preemption related.

> The bpf_event_output can be called by programs executed by
> bpf_prog_run_array_cg function that disabled migration but
> keeps preemption enabled.
>
> This can cause task to be preempted by another one inside the
> nesting protection and lead eventually to two tasks using same
> perf_sample_data buffer and cause crashes like:
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000001
>   #PF: supervisor instruction fetch in kernel mode
>   #PF: error_code(0x0010) - not-present page
>   ...
>   ? perf_output_sample+0x12a/0x9a0
>   ? finish_task_switch.isra.0+0x81/0x280
>   ? perf_event_output+0x66/0xa0
>   ? bpf_event_output+0x13a/0x190
>   ? bpf_event_output_data+0x22/0x40
>   ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
>   ? xa_load+0x87/0xe0
>   ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
>   ? release_sock+0x3e/0x90
>   ? sk_setsockopt+0x1a1/0x12f0
>   ? udp_pre_connect+0x36/0x50
>   ? inet_dgram_connect+0x93/0xa0
>   ? __sys_connect+0xb4/0xe0
>   ? udp_setsockopt+0x27/0x40
>   ? __pfx_udp_push_pending_frames+0x10/0x10
>   ? __sys_setsockopt+0xdf/0x1a0
>   ? __x64_sys_connect+0xf/0x20
>   ? do_syscall_64+0x3a/0x90
>   ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> Fixing this by disabling preemption in bpf_event_output.
>
> [1] https://github.com/cilium/cilium/issues/26756
> Reported-by:  Oleg "livelace" Popov <o.popov@xxxxxxxxxxx>
> Fixes: 768fb61fcc13 ("bpf: Fix bpf_event_output re-entry issue")
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
>  kernel/trace/bpf_trace.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 2a6ba05d8aee..36fb6e483952 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -720,7 +720,6 @@ static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds);
>  u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
>                      void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy)
>  {
> -       int nest_level = this_cpu_inc_return(bpf_event_output_nest_level);
>         struct perf_raw_frag frag = {
>                 .copy           = ctx_copy,
>                 .size           = ctx_size,
> @@ -737,8 +736,13 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
>         };
>         struct perf_sample_data *sd;
>         struct pt_regs *regs;
> +       int nest_level;
>         u64 ret;
>
> +       preempt_disable();
> +
> +       nest_level = this_cpu_inc_return(bpf_event_output_nest_level);
> +
>         if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) {
>                 ret = -EBUSY;
>                 goto out;
> @@ -753,6 +757,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
>         ret = __bpf_perf_event_output(regs, map, flags, sd);
>  out:
>         this_cpu_dec(bpf_event_output_nest_level);
> +       preempt_enable();
>         return ret;
>  }
>
> --
> 2.41.0
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux