Re: pahole issue. Re: [PATCH bpf-next 2/2] fork: Rename mm_init to task_mm_init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 2, 2023 at 11:40 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> Alan,
>
> wdyt on below?
>

Hi Alexei,

Per my understanding, not only does pahole have issues, but also there
are issues in the kernel.
This panic is caused by the inconsistency between BTF and kallsyms as such:
   bpf_check_attach_target
       tname = btf_name_by_offset(btf, t->name_off); // btf
       addr = kallsyms_lookup_name(tname); // kallsyms

So if the function displayed in /proc/sys/btf/vmlinux is not the same
with the function displayed in /proc/kallsyms, we will get a wrong
addr.  I think it is not proper to rely wholly on the userspace tools
to make them the same. The kernel should also imrpve the verifier to
make sure they are really the same function.  WDYT?

> On Thu, Apr 27, 2023 at 4:35 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 25, 2023 at 5:13 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Apr 24, 2023 at 9:12 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > >
> > > > The kernel will panic as follows when attaching fexit to mm_init,
> > > >
> > > > [   86.549700] ------------[ cut here ]------------
> > > > [   86.549712] BUG: kernel NULL pointer dereference, address: 0000000000000078
> > > > [   86.549713] #PF: supervisor read access in kernel mode
> > > > [   86.549715] #PF: error_code(0x0000) - not-present page
> > > > [   86.549716] PGD 10308f067 P4D 10308f067 PUD 11754e067 PMD 0
> > > > [   86.549719] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > [   86.549722] CPU: 9 PID: 9829 Comm: main_amd64 Kdump: loaded Not tainted 6.3.0-rc6+ #12
> > > > [   86.549725] RIP: 0010:check_preempt_wakeup+0xd1/0x310
> > > > [   86.549754] Call Trace:
> > > > [   86.549755]  <TASK>
> > > > [   86.549757]  check_preempt_curr+0x5e/0x70
> > > > [   86.549761]  ttwu_do_activate+0xab/0x350
> > > > [   86.549763]  try_to_wake_up+0x314/0x680
> > > > [   86.549765]  wake_up_process+0x15/0x20
> > > > [   86.549767]  insert_work+0xb2/0xd0
> > > > [   86.549772]  __queue_work+0x20a/0x400
> > > > [   86.549774]  queue_work_on+0x7b/0x90
> > > > [   86.549778]  drm_fb_helper_sys_imageblit+0xd7/0xf0 [drm_kms_helper]
> > > > [   86.549801]  drm_fbdev_fb_imageblit+0x5b/0xb0 [drm_kms_helper]
> > > > [   86.549813]  soft_cursor+0x1cb/0x250
> > > > [   86.549816]  bit_cursor+0x3ce/0x630
> > > > [   86.549818]  fbcon_cursor+0x139/0x1c0
> > > > [   86.549821]  ? __pfx_bit_cursor+0x10/0x10
> > > > [   86.549822]  hide_cursor+0x31/0xd0
> > > > [   86.549825]  vt_console_print+0x477/0x4e0
> > > > [   86.549828]  console_flush_all+0x182/0x440
> > > > [   86.549832]  console_unlock+0x58/0xf0
> > > > [   86.549834]  vprintk_emit+0x1ae/0x200
> > > > [   86.549837]  vprintk_default+0x1d/0x30
> > > > [   86.549839]  vprintk+0x5c/0x90
> > > > [   86.549841]  _printk+0x58/0x80
> > > > [   86.549843]  __warn_printk+0x7e/0x1a0
> > > > [   86.549845]  ? trace_preempt_off+0x1b/0x70
> > > > [   86.549848]  ? trace_preempt_on+0x1b/0x70
> > > > [   86.549849]  ? __percpu_counter_init+0x8e/0xb0
> > > > [   86.549853]  refcount_warn_saturate+0x9f/0x150
> > > > [   86.549855]  mm_init+0x379/0x390
> > > > [   86.549859]  bpf_trampoline_6442453440_0+0x23/0x1000
> > > > [   86.549862]  mm_init+0x5/0x390
> > > > [   86.549865]  ? mm_alloc+0x4e/0x60
> > > > [   86.549866]  alloc_bprm+0x8a/0x2e0
> > > > [   86.549869]  do_execveat_common.isra.0+0x67/0x240
> > > > [   86.549872]  __x64_sys_execve+0x37/0x50
> > > > [   86.549874]  do_syscall_64+0x38/0x90
> > > > [   86.549877]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > > >
> > > > The reason is that when we attach the btf id of the function mm_init we
> > > > actually attach the mm_init defined in init/main.c rather than the
> > > > function defined in kernel/fork.c. That can be proved by parsing
> > > > /sys/kernel/btf/vmlinux:
> > > >
> > > > [2493] FUNC 'initcall_blacklist' type_id=2477 linkage=static
> > > > [2494] FUNC_PROTO '(anon)' ret_type_id=21 vlen=1
> > > >         'buf' type_id=57
> > > > [2495] FUNC 'early_randomize_kstack_offset' type_id=2494 linkage=static
> > > > [2496] FUNC 'mm_init' type_id=118 linkage=static
> > > > [2497] FUNC 'trap_init' type_id=118 linkage=static
> > > > [2498] FUNC 'thread_stack_cache_init' type_id=118 linkage=static
> > > >
> > > > From the above information we can find that the FUNCs above and below
> > > > mm_init are all defined in init/main.c. So there's no doubt that the
> > > > mm_init is also the function defined in init/main.c.
> > > >
> > > > So when a task calls mm_init and thus the bpf trampoline is triggered it
> > > > will use the information of the mm_init defined in init/main.c. Then the
> > > > panic will occur.
> > > >
> > > > It seems that there're issues in btf, for example it is unnecessary to
> > > > generate btf for the functions annonated with __init. We need to improve
> > > > btf. However we also need to change the function defined in
> > > > kernel/fork.c to task_mm_init to better distinguish them. After it is
> > > > renamed to task_mm_init, the /sys/kernel/btf/vmlinux will be:
> > > >
> > > > [13970] FUNC 'mm_alloc' type_id=13969 linkage=static
> > > > [13971] FUNC_PROTO '(anon)' ret_type_id=204 vlen=3
> > > >         'mm' type_id=204
> > > >         'p' type_id=197
> > > >         'user_ns' type_id=452
> > > > [13972] FUNC 'task_mm_init' type_id=13971 linkage=static
> > > > [13973] FUNC 'coredump_filter_setup' type_id=3804 linkage=static
> > > > [13974] FUNC_PROTO '(anon)' ret_type_id=197 vlen=2
> > > >         'orig' type_id=197
> > > >         'node' type_id=21
> > > > [13975] FUNC 'dup_task_struct' type_id=13974 linkage=static
> > > >
> > > > And then attaching task_mm_init won't panic. Improving the btf will be
> > > > handled later.
> > >
> > > We're not going to hack the kernel to workaround pahole issue.
> > > Let's fix pahole instead.
> > > cc-ing Alan for ideas.
> >
> > Any comment on it, Alan ?
> > I think we can just skip generating BTF for the functions in
> > __section(".init.text"),  as these functions will be freed after
> > kernel init. There won't be use cases for them.
> >
> > --
> > Regards
> > Yafang



-- 
Regards
Yafang





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux