Re: pahole issue. Re: [PATCH bpf-next 2/2] fork: Rename mm_init to task_mm_init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 9, 2023 at 11:44 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 02/05/2023 04:40, Alexei Starovoitov wrote:
> > Alan,
> >
> > wdyt on below?
> >
>
> apologies, missed this; see below..
>
> > On Thu, Apr 27, 2023 at 4:35 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >>
> >> On Tue, Apr 25, 2023 at 5:13 AM Alexei Starovoitov
> >> <alexei.starovoitov@xxxxxxxxx> wrote:
> >>>
> >>> On Mon, Apr 24, 2023 at 9:12 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >>>>
> >>>> The kernel will panic as follows when attaching fexit to mm_init,
> >>>>
> >>>> [   86.549700] ------------[ cut here ]------------
> >>>> [   86.549712] BUG: kernel NULL pointer dereference, address: 0000000000000078
> >>>> [   86.549713] #PF: supervisor read access in kernel mode
> >>>> [   86.549715] #PF: error_code(0x0000) - not-present page
> >>>> [   86.549716] PGD 10308f067 P4D 10308f067 PUD 11754e067 PMD 0
> >>>> [   86.549719] Oops: 0000 [#1] PREEMPT SMP NOPTI
> >>>> [   86.549722] CPU: 9 PID: 9829 Comm: main_amd64 Kdump: loaded Not tainted 6.3.0-rc6+ #12
> >>>> [   86.549725] RIP: 0010:check_preempt_wakeup+0xd1/0x310
> >>>> [   86.549754] Call Trace:
> >>>> [   86.549755]  <TASK>
> >>>> [   86.549757]  check_preempt_curr+0x5e/0x70
> >>>> [   86.549761]  ttwu_do_activate+0xab/0x350
> >>>> [   86.549763]  try_to_wake_up+0x314/0x680
> >>>> [   86.549765]  wake_up_process+0x15/0x20
> >>>> [   86.549767]  insert_work+0xb2/0xd0
> >>>> [   86.549772]  __queue_work+0x20a/0x400
> >>>> [   86.549774]  queue_work_on+0x7b/0x90
> >>>> [   86.549778]  drm_fb_helper_sys_imageblit+0xd7/0xf0 [drm_kms_helper]
> >>>> [   86.549801]  drm_fbdev_fb_imageblit+0x5b/0xb0 [drm_kms_helper]
> >>>> [   86.549813]  soft_cursor+0x1cb/0x250
> >>>> [   86.549816]  bit_cursor+0x3ce/0x630
> >>>> [   86.549818]  fbcon_cursor+0x139/0x1c0
> >>>> [   86.549821]  ? __pfx_bit_cursor+0x10/0x10
> >>>> [   86.549822]  hide_cursor+0x31/0xd0
> >>>> [   86.549825]  vt_console_print+0x477/0x4e0
> >>>> [   86.549828]  console_flush_all+0x182/0x440
> >>>> [   86.549832]  console_unlock+0x58/0xf0
> >>>> [   86.549834]  vprintk_emit+0x1ae/0x200
> >>>> [   86.549837]  vprintk_default+0x1d/0x30
> >>>> [   86.549839]  vprintk+0x5c/0x90
> >>>> [   86.549841]  _printk+0x58/0x80
> >>>> [   86.549843]  __warn_printk+0x7e/0x1a0
> >>>> [   86.549845]  ? trace_preempt_off+0x1b/0x70
> >>>> [   86.549848]  ? trace_preempt_on+0x1b/0x70
> >>>> [   86.549849]  ? __percpu_counter_init+0x8e/0xb0
> >>>> [   86.549853]  refcount_warn_saturate+0x9f/0x150
> >>>> [   86.549855]  mm_init+0x379/0x390
> >>>> [   86.549859]  bpf_trampoline_6442453440_0+0x23/0x1000
> >>>> [   86.549862]  mm_init+0x5/0x390
> >>>> [   86.549865]  ? mm_alloc+0x4e/0x60
> >>>> [   86.549866]  alloc_bprm+0x8a/0x2e0
> >>>> [   86.549869]  do_execveat_common.isra.0+0x67/0x240
> >>>> [   86.549872]  __x64_sys_execve+0x37/0x50
> >>>> [   86.549874]  do_syscall_64+0x38/0x90
> >>>> [   86.549877]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> >>>>
> >>>> The reason is that when we attach the btf id of the function mm_init we
> >>>> actually attach the mm_init defined in init/main.c rather than the
> >>>> function defined in kernel/fork.c. That can be proved by parsing
> >>>> /sys/kernel/btf/vmlinux:
> >>>>
> >>>> [2493] FUNC 'initcall_blacklist' type_id=2477 linkage=static
> >>>> [2494] FUNC_PROTO '(anon)' ret_type_id=21 vlen=1
> >>>>         'buf' type_id=57
> >>>> [2495] FUNC 'early_randomize_kstack_offset' type_id=2494 linkage=static
> >>>> [2496] FUNC 'mm_init' type_id=118 linkage=static
> >>>> [2497] FUNC 'trap_init' type_id=118 linkage=static
> >>>> [2498] FUNC 'thread_stack_cache_init' type_id=118 linkage=static
> >>>>
> >>>> From the above information we can find that the FUNCs above and below
> >>>> mm_init are all defined in init/main.c. So there's no doubt that the
> >>>> mm_init is also the function defined in init/main.c.
> >>>>
> >>>> So when a task calls mm_init and thus the bpf trampoline is triggered it
> >>>> will use the information of the mm_init defined in init/main.c. Then the
> >>>> panic will occur.
> >>>>
> >>>> It seems that there're issues in btf, for example it is unnecessary to
> >>>> generate btf for the functions annonated with __init. We need to improve
> >>>> btf. However we also need to change the function defined in
> >>>> kernel/fork.c to task_mm_init to better distinguish them. After it is
> >>>> renamed to task_mm_init, the /sys/kernel/btf/vmlinux will be:
> >>>>
> >>>> [13970] FUNC 'mm_alloc' type_id=13969 linkage=static
> >>>> [13971] FUNC_PROTO '(anon)' ret_type_id=204 vlen=3
> >>>>         'mm' type_id=204
> >>>>         'p' type_id=197
> >>>>         'user_ns' type_id=452
> >>>> [13972] FUNC 'task_mm_init' type_id=13971 linkage=static
> >>>> [13973] FUNC 'coredump_filter_setup' type_id=3804 linkage=static
> >>>> [13974] FUNC_PROTO '(anon)' ret_type_id=197 vlen=2
> >>>>         'orig' type_id=197
> >>>>         'node' type_id=21
> >>>> [13975] FUNC 'dup_task_struct' type_id=13974 linkage=static
> >>>>
> >>>> And then attaching task_mm_init won't panic. Improving the btf will be
> >>>> handled later.
> >>>
> >>> We're not going to hack the kernel to workaround pahole issue.
> >>> Let's fix pahole instead.
> >>> cc-ing Alan for ideas.
> >>
> >> Any comment on it, Alan ?
> >> I think we can just skip generating BTF for the functions in
> >> __section(".init.text"),  as these functions will be freed after
> >> kernel init. There won't be use cases for them.
> >>
>
> won't the pahole v1.25 changes help here; can you try applying
>
> https://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@xxxxxxxxxx/
>
> ...and build using pahole; this should eliminate any functions
> with inconsistent prototypes via
>
>       --skip_encoding_btf_inconsistent_proto
>         Do not encode functions with multiple inconsistent prototypes or
>         unexpected register use for their parameters, where  the  regis‐
>         ters used do not match calling conventions.
>
>
> I'll check this at my end too.
>
> Alexei, if this works should we look at applying the above
> again to bpf-next? If so I'll resend the patch.

I've lost the track with pahole fixes.
Did Arnaldo re-tag pahole 1.25 or released 1.26 with the fixes?

Alan,
please submit a fresh patch for bpf-next to enable
--skip_encoding_btf_inconsistent_proto, so it can go through CI.
I cannot test all combinations manually.

Thanks!





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux