On 11/9/24 12:14 PM, Alexei Starovoitov wrote:
On Fri, Nov 8, 2024 at 6:53 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
stack_depth = bpf_prog->aux->stack_depth;
+ if (bpf_prog->aux->priv_stack_ptr) {
+ priv_frame_ptr = bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 16);
+ stack_depth = 0;
+ }
...
+ priv_stack_ptr = prog->aux->priv_stack_ptr;
+ if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
+ priv_stack_ptr = __alloc_percpu_gfp(prog->aux->stack_depth, 16, GFP_KERNEL);
After applying I started to see crashes running test_progs -j like:
[ 173.465191] Oops: general protection fault, probably for
non-canonical address 0xdffffc0000000af9: 0000 [#1] PREEMPT SMP KASAN
[ 173.466053] KASAN: probably user-memory-access in range
[0x00000000000057c8-0x00000000000057cf]
[ 173.466053] RIP: 0010:dst_dev_put+0x1e/0x220
[ 173.466053] Call Trace:
[ 173.466053] <IRQ>
[ 173.466053] ? die_addr+0x40/0xa0
[ 173.466053] ? exc_general_protection+0x138/0x1f0
[ 173.466053] ? asm_exc_general_protection+0x26/0x30
[ 173.466053] ? dst_dev_put+0x1e/0x220
[ 173.466053] rt_fibinfo_free_cpus.part.0+0x8c/0x130
[ 173.466053] fib_nh_common_release+0xd6/0x2a0
[ 173.466053] free_fib_info_rcu+0x129/0x360
[ 173.466053] ? rcu_core+0xa55/0x1340
[ 173.466053] rcu_core+0xa55/0x1340
[ 173.466053] ? rcutree_report_cpu_dead+0x380/0x380
[ 173.466053] ? hrtimer_interrupt+0x319/0x7c0
[ 173.466053] handle_softirqs+0x14c/0x4d0
[ 35.134115] Oops: general protection fault, probably for
non-canonical address 0xe0000bfff101fbbc: 0000 [#1] PREEMPT SMP KASAN
[ 35.135089] KASAN: probably user-memory-access in range
[0x00007fff880fdde0-0x00007fff880fdde7]
[ 35.135089] RIP: 0010:destroy_workqueue+0x4b4/0xa70
[ 35.135089] Call Trace:
[ 35.135089] <TASK>
[ 35.135089] ? die_addr+0x40/0xa0
[ 35.135089] ? exc_general_protection+0x138/0x1f0
[ 35.135089] ? asm_exc_general_protection+0x26/0x30
[ 35.135089] ? destroy_workqueue+0x4b4/0xa70
[ 35.135089] ? destroy_workqueue+0x592/0xa70
[ 35.135089] ? __mutex_unlock_slowpath.isra.0+0x270/0x270
[ 35.135089] ext4_put_super+0xff/0xd70
[ 35.135089] generic_shutdown_super+0x148/0x4c0
[ 35.135089] kill_block_super+0x3b/0x90
[ 35.135089] ext4_kill_sb+0x65/0x90
I think I see the bug... quoted it above...
Please make sure you reproduce it first.
Then let's figure out a way how to test for such things and
what we can do to make kasan detect it sooner,
since above crashes have no indication at all that bpf priv stack
is responsible.
If there is another bug in priv stack and it will cause future
crashes we need to make sure that priv stack corruption is
detected by kasan (or whatever mechanism) earlier.
We cannot land private stack support when there is
a possibility of such silent corruption.
I can reproduce it now when running multiple times.
I will debug this ASAP.
pw-bot: cr