Re: [PATCH bpf] bpf/arena: fix softlockup in arena_map_free on 64k page kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/02/2025 10:10, Alexei Starovoitov wrote:
> On Tue, Feb 4, 2025 at 9:25 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>>
>> On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y (64k pages),
>> arena_htab tests cause a segmentation fault and soft lockup.
>>
>> $ sudo ./test_progs -t arena_htab
>> Caught signal #11!
>> Stack trace:
>> ./test_progs(crash_handler+0x1c)[0x7bd4d8]
>> linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb34a0968]
>> ./test_progs[0x420f74]
>> ./test_progs(htab_lookup_elem+0x3c)[0x421090]
>> ./test_progs[0x421320]
>> ./test_progs[0x421bb8]
>> ./test_progs(test_arena_htab+0x40)[0x421c14]
>> ./test_progs[0x7bda84]
>> ./test_progs(main+0x65c)[0x7bf670]
>> /usr/lib64/libc.so.6(+0x2caa0)[0xffffb31ecaa0]
>> /usr/lib64/libc.so.6(__libc_start_main+0x98)[0xffffb31ecb78]
>> ./test_progs(_start+0x30)[0x41b4f0]
>>
>> Message from syslogd@bpfol9aarch64 at Feb  4 08:50:09 ...
>>  kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u8:4:7589]
>>
>> The same failure is not observed with 4k pages on aarch64.
>>
>> Investigating further, it turns out arena_map_free() was calling
>> apply_to_existing_page_range() with the address returned by
>> bpf_arena_get_kern_vm_start().  If this address is not page-aligned -
>> as is the case for a 64k page kernel - we wind up calling apply_to_pte_range()
>> with that unaligned address.  The problem is apply_to_pte_range() implicitly
>> assumes that the addr passed in is page-aligned, specifically in this loop:
>>
>>                 do {
>>                         if (create || !pte_none(ptep_get(pte))) {
>>                                 err = fn(pte++, addr, data);
>>                                 if (err)
>>                                         break;
>>                         }
>>                 } while (addr += PAGE_SIZE, addr != end);
>>
>> If addr is _not_ page-aligned, it will never equal end exactly.
>>
>> One solution is to round up the address returned by bpf_arena_get_kern_vm_start()
>> to a page-aligned value.  With that change in place the test passes:
>>
>> $ sudo ./test_progs -t arena_htab
>> Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED
>>
>> Reported-by: Colm Harrington <colm.harrington@xxxxxxxxxx>
>> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
>> ---
>>  kernel/bpf/arena.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
>> index 870aeb51d70a..07395c55833e 100644
>> --- a/kernel/bpf/arena.c
>> +++ b/kernel/bpf/arena.c
>> @@ -54,7 +54,7 @@ struct bpf_arena {
>>
>>  u64 bpf_arena_get_kern_vm_start(struct bpf_arena *arena)
>>  {
>> -       return arena ? (u64) (long) arena->kern_vm->addr + GUARD_SZ / 2 : 0;
>> +       return arena ? (u64) round_up((long) arena->kern_vm->addr + GUARD_SZ / 2, PAGE_SIZE) : 0;
> 
> Thanks for the report. The fix is incorrect though.
> GUARD_SZ/2 is 32k,
> so with roundup the upper guard is gone.
> We probably need to:
> -#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8)
> +#define GUARD_SZ round_up(1ull << sizeof_field(struct bpf_insn, off)
> * 8, PAGE_SIZE << 1)
>

I tested this and it also resolves the test failure/softlockup. I'll
wait for a bit but can follow up with v2 incorporating your fix if there
are no further suggestions for refinements. Thanks!

Alan

> Better ideas?
> 
> pw-bot: cr
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux