On Mon, Apr 24, 2023 at 9:12 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > The kernel will panic as follows when attaching fexit to mm_init, > > [ 86.549700] ------------[ cut here ]------------ > [ 86.549712] BUG: kernel NULL pointer dereference, address: 0000000000000078 > [ 86.549713] #PF: supervisor read access in kernel mode > [ 86.549715] #PF: error_code(0x0000) - not-present page > [ 86.549716] PGD 10308f067 P4D 10308f067 PUD 11754e067 PMD 0 > [ 86.549719] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 86.549722] CPU: 9 PID: 9829 Comm: main_amd64 Kdump: loaded Not tainted 6.3.0-rc6+ #12 > [ 86.549725] RIP: 0010:check_preempt_wakeup+0xd1/0x310 > [ 86.549754] Call Trace: > [ 86.549755] <TASK> > [ 86.549757] check_preempt_curr+0x5e/0x70 > [ 86.549761] ttwu_do_activate+0xab/0x350 > [ 86.549763] try_to_wake_up+0x314/0x680 > [ 86.549765] wake_up_process+0x15/0x20 > [ 86.549767] insert_work+0xb2/0xd0 > [ 86.549772] __queue_work+0x20a/0x400 > [ 86.549774] queue_work_on+0x7b/0x90 > [ 86.549778] drm_fb_helper_sys_imageblit+0xd7/0xf0 [drm_kms_helper] > [ 86.549801] drm_fbdev_fb_imageblit+0x5b/0xb0 [drm_kms_helper] > [ 86.549813] soft_cursor+0x1cb/0x250 > [ 86.549816] bit_cursor+0x3ce/0x630 > [ 86.549818] fbcon_cursor+0x139/0x1c0 > [ 86.549821] ? __pfx_bit_cursor+0x10/0x10 > [ 86.549822] hide_cursor+0x31/0xd0 > [ 86.549825] vt_console_print+0x477/0x4e0 > [ 86.549828] console_flush_all+0x182/0x440 > [ 86.549832] console_unlock+0x58/0xf0 > [ 86.549834] vprintk_emit+0x1ae/0x200 > [ 86.549837] vprintk_default+0x1d/0x30 > [ 86.549839] vprintk+0x5c/0x90 > [ 86.549841] _printk+0x58/0x80 > [ 86.549843] __warn_printk+0x7e/0x1a0 > [ 86.549845] ? trace_preempt_off+0x1b/0x70 > [ 86.549848] ? trace_preempt_on+0x1b/0x70 > [ 86.549849] ? __percpu_counter_init+0x8e/0xb0 > [ 86.549853] refcount_warn_saturate+0x9f/0x150 > [ 86.549855] mm_init+0x379/0x390 > [ 86.549859] bpf_trampoline_6442453440_0+0x23/0x1000 > [ 86.549862] mm_init+0x5/0x390 > [ 86.549865] ? mm_alloc+0x4e/0x60 > [ 86.549866] alloc_bprm+0x8a/0x2e0 > [ 86.549869] do_execveat_common.isra.0+0x67/0x240 > [ 86.549872] __x64_sys_execve+0x37/0x50 > [ 86.549874] do_syscall_64+0x38/0x90 > [ 86.549877] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > The reason is that when we attach the btf id of the function mm_init we > actually attach the mm_init defined in init/main.c rather than the > function defined in kernel/fork.c. That can be proved by parsing > /sys/kernel/btf/vmlinux: > > [2493] FUNC 'initcall_blacklist' type_id=2477 linkage=static > [2494] FUNC_PROTO '(anon)' ret_type_id=21 vlen=1 > 'buf' type_id=57 > [2495] FUNC 'early_randomize_kstack_offset' type_id=2494 linkage=static > [2496] FUNC 'mm_init' type_id=118 linkage=static > [2497] FUNC 'trap_init' type_id=118 linkage=static > [2498] FUNC 'thread_stack_cache_init' type_id=118 linkage=static > > From the above information we can find that the FUNCs above and below > mm_init are all defined in init/main.c. So there's no doubt that the > mm_init is also the function defined in init/main.c. > > So when a task calls mm_init and thus the bpf trampoline is triggered it > will use the information of the mm_init defined in init/main.c. Then the > panic will occur. > > It seems that there're issues in btf, for example it is unnecessary to > generate btf for the functions annonated with __init. We need to improve > btf. However we also need to change the function defined in > kernel/fork.c to task_mm_init to better distinguish them. After it is > renamed to task_mm_init, the /sys/kernel/btf/vmlinux will be: > > [13970] FUNC 'mm_alloc' type_id=13969 linkage=static > [13971] FUNC_PROTO '(anon)' ret_type_id=204 vlen=3 > 'mm' type_id=204 > 'p' type_id=197 > 'user_ns' type_id=452 > [13972] FUNC 'task_mm_init' type_id=13971 linkage=static > [13973] FUNC 'coredump_filter_setup' type_id=3804 linkage=static > [13974] FUNC_PROTO '(anon)' ret_type_id=197 vlen=2 > 'orig' type_id=197 > 'node' type_id=21 > [13975] FUNC 'dup_task_struct' type_id=13974 linkage=static > > And then attaching task_mm_init won't panic. Improving the btf will be > handled later. We're not going to hack the kernel to workaround pahole issue. Let's fix pahole instead. cc-ing Alan for ideas.