On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote: > On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@xxxxxxxxx> wrote: > > > > trie_get_next_key() allocates a node stack with size trie->max_prefixlen, > > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has > > full paths from the root to leaves. For example, consider a trie with > > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ... > > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with > > .prefixlen = 8 make 9 nodes be written on the node stack with size 8. > > Hmm. It sounds possible, but pls demonstrate it with a selftest. > With the amount of fuzzing I'm surprised it was not discovered earlier. > > pw-bot: cr With a simple test below, the kernel crashes in a minute or you can easily discover the bug on KFENCE-enabled kernels. #!/bin/bash bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \ entries 16 flags 0x1name lpm for i in {0..8}; do bpftool map update pinned /sys/fs/bpf/lpm \ key hex 0$i 00 00 00 00 \ value hex 00 any done while true; do bpftool map dump pinned /sys/fs/bpf/lpm done In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this message as expected. [ 463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670 [ 463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156): [ 463.144438] trie_get_next_key+0x2f2/0x670 [ 463.145439] map_get_next_key+0x261/0x410 [ 463.146444] __sys_bpf+0xad4/0x1170 [ 463.147438] __x64_sys_bpf+0x74/0xc0 [ 463.148431] do_syscall_64+0x79/0x150 [ 463.149425] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256 [ 463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago): [ 463.154413] trie_get_next_key+0x252/0x670 [ 463.155411] map_get_next_key+0x261/0x410 [ 463.156402] __sys_bpf+0xad4/0x1170 [ 463.157390] __x64_sys_bpf+0x74/0xc0 [ 463.158386] do_syscall_64+0x79/0x150 [ 463.159372] entry_SYSCALL_64_after_hwframe+0x76/0x7e