On 10/22/2024 9:45 AM, Byeonguk Jeong wrote: > trie_get_next_key() allocates a node stack with size trie->max_prefixlen, > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has > full paths from the root to leaves. For example, consider a trie with > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ... > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with > .prefixlen = 8 make 9 nodes be written on the node stack with size 8. > > Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") > Signed-off-by: Byeonguk Jeong <jungbu2855@xxxxxxxxx> > --- Tested-by: Hou Tao <houtao1@xxxxxxxxxx> Without the fix, there will be KASAN report as show below when dumping all keys in the lpm-trie through bpf_map_get_next_key(). However, I have a dumb question: does it make sense to reject the element with prefixlen = 0 ? Because I can't think of a use case where a zero-length prefix will be useful. ================================================================== BUG: KASAN: slab-out-of-bounds in trie_get_next_key+0x133/0x530 Write of size 8 at addr ffff8881076c2fc0 by task test_lpm_trie.b/446 CPU: 0 UID: 0 PID: 446 Comm: test_lpm_trie.b Not tainted 6.11.0+ #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 print_report+0xce/0x610 ? trie_get_next_key+0x133/0x530 ? kasan_complete_mode_report_info+0x3c/0x200 ? trie_get_next_key+0x133/0x530 kasan_report+0x9c/0xd0 ? trie_get_next_key+0x133/0x530 __asan_store8+0x81/0xb0 trie_get_next_key+0x133/0x530 __sys_bpf+0x1b03/0x3140 ? __pfx___sys_bpf+0x10/0x10 ? __pfx_vfs_write+0x10/0x10 ? find_held_lock+0x8e/0xb0 ? ksys_write+0xee/0x180 ? syscall_exit_to_user_mode+0xb3/0x220 ? mark_held_locks+0x28/0x90 ? mark_held_locks+0x28/0x90 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f9c5e9c9c5d ...... </TASK> Allocated by task 446: kasan_save_stack+0x28/0x50 kasan_save_track+0x14/0x30 kasan_save_alloc_info+0x36/0x40 __kasan_kmalloc+0x84/0xa0 __kmalloc_noprof+0x214/0x540 trie_get_next_key+0xa7/0x530 __sys_bpf+0x1b03/0x3140 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff8881076c2f80 which belongs to the cache kmalloc-rnd-09-64 of size 64 The buggy address is located 0 bytes to the right of allocated 64-byte region [ffff8881076c2f80, ffff8881076c2fc0) > kernel/bpf/lpm_trie.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c > index 0218a5132ab5..9b60eda0f727 100644 > --- a/kernel/bpf/lpm_trie.c > +++ b/kernel/bpf/lpm_trie.c > @@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key) > if (!key || key->prefixlen > trie->max_prefixlen) > goto find_leftmost; > > - node_stack = kmalloc_array(trie->max_prefixlen, > + node_stack = kmalloc_array(trie->max_prefixlen + 1, > sizeof(struct lpm_trie_node *), > GFP_ATOMIC | __GFP_NOWARN); > if (!node_stack)