"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> writes: > On Tue, Apr 09, 2024 at 07:45:54AM +0000, Puranjay Mohan wrote: >> "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> writes: >> >> > On Fri, Apr 05, 2024 at 10:50:30AM -0700, Andrii Nakryiko wrote: >> >> On Fri, Apr 5, 2024 at 9:30 AM Alexei Starovoitov >> >> <alexei.starovoitov@xxxxxxxxx> wrote: >> >> > >> >> > On Fri, Apr 5, 2024 at 4:36 AM Russell King (Oracle) >> >> > <linux@xxxxxxxxxxxxxxx> wrote: >> >> > > >> >> > > On Fri, Apr 05, 2024 at 12:02:36PM +0100, Mark Rutland wrote: >> >> > > > On Thu, Apr 04, 2024 at 03:57:04PM -0700, Alexei Starovoitov wrote: >> >> > > > > On Wed, Apr 3, 2024 at 6:56 PM Andrew Morton <akpm@linux-foundationorg> wrote: >> >> > > > > > >> >> > > > > > On Mon, 01 Apr 2024 22:19:25 -0700 syzbot <syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >> >> > > > > > >> >> > > > > > > Hello, >> >> > > > > > >> >> > > > > > Thanks. Cc: bpf@xxxxxxxxxxxxxxx >> >> > > > > >> >> > > > > I suspect the issue is not on bpf side. >> >> > > > > Looks like the bug is somewhere in arm32 bits. >> >> > > > > copy_from_kernel_nofault() is called from lots of places. >> >> > > > > bpf is just one user that is easy for syzbot to fuzz. >> >> > > > > Interestingly arm defines copy_from_kernel_nofault_allowed() >> >> > > > > that should have filtered out user addresses. >> >> > > > > In this case ffffffe9 is probably a kernel address? >> >> > > > >> >> > > > It's at the end of the kernel range, and it's ERR_PTR(-EINVAL). >> >> > > > >> >> > > > 0xffffffe9 is -0x16, which is -22, which is -EINVAL. >> >> > > > >> >> > > > > But the kernel is doing a write? >> >> > > > > Which makes no sense, since copy_from_kernel_nofault is probe reading. >> >> > > > >> >> > > > It makes perfect sense; the read from 'src' happened, then the kernel tries to >> >> > > > write the result to 'dst', and that aligns with the disassembly in the report >> >> > > > below, which I beleive is: >> >> > > > >> >> > > > 8: e4942000 ldr r2, [r4], #0 <-- Read of 'src', fault fixup is elsewhere >> >> > > > c: e3530000 cmp r3, #0 >> >> > > > * 10: e5852000 str r2, [r5] <-- Write to 'dst' >> >> > > > >> >> > > > As above, it looks like 'dst' is ERR_PTR(-EINVAL). >> >> > > > >> >> > > > Are you certain that BPF is passing a sane value for 'dst'? Where does that >> >> > > > come from in the first place? >> >> > > >> >> > > It looks to me like it gets passed in from the BPF program, and the >> >> > > "type" for the argument is set to ARG_PTR_TO_UNINIT_MEM. What that >> >> > > means for validation purposes, I've no idea, I'm not a BPF hacker. >> >> > > >> >> > > Obviously, if BPF is allowing copy_from_kernel_nofault() to be passed >> >> > > an arbitary destination address, that would be a huge security hole. >> >> > >> >> > If that's the case that's indeed a giant security hole, >> >> > but I doubt it. We would be crashing other archs as well. >> >> > I cannot really tell whether arm32 JIT is on. >> >> > If it is, it's likely a bug there. >> >> > Puranjay, >> >> > could you please take a look. >> >> > >> >> >> >> I dumped the BPF program that repro.c is loading, it works on x86-64 >> >> and there is nothing special there. We are probe-reading 5 bytes from >> >> somewhere into the stack. Everything is unaligned here, but stays >> >> within a well-defined memory slot. >> >> >> >> Note the r3 = (s8)r1, that's a new-ish thing, maybe bug is somewhere >> >> there (but then it would be JIT, not verifier itself) >> >> >> >> 0: (7a) *(u64 *)(r10 -8) = 896542069 >> >> 1: (bf) r1 = r10 >> >> 2: (07) r1 += -7 >> >> 3: (b7) r2 = 5 >> >> 4: (bf) r3 = (s8)r1 >> >> 5: (85) call bpf_probe_read_kernel#-72390 >> > >> >> I have started looking into this, the issue only reproduces when the JIT >> is enabled. With the interpreter, it works fine. >> >> I used GDB to dump the JITed BPF program: >> >> 0xbf00012c: push {r4, r5, r6, r7, r8, r9, r11, lr} >> 0xbf000130: mov r11, sp >> 0xbf000134: mov r3, #0 >> 0xbf000138: sub r2, sp, #80 @ 0x50 >> 0xbf00013c: sub sp, sp, #88 @ 0x58 >> 0xbf000140: strd r2, [r11, #-64] @ 0xffffffc0 >> 0xbf000144: mov r2, #0 >> 0xbf000148: strd r2, [r11, #-72] @ 0xffffffb8 >> 0xbf00014c: mov r2, r0 >> 0xbf000150: movw r8, #9589 @ 0x2575 >> 0xbf000154: movt r8, #13680 @ 0x3570 >> 0xbf000158: mov r9, #0 >> 0xbf00015c: ldr r6, [r11, #-64] @ 0xffffffc0 >> 0xbf000160: str r8, [r6, #-8] >> 0xbf000164: str r9, [r6, #-4] >> 0xbf000168: ldrd r2, [r11, #-64] @ 0xffffffc0 >> 0xbf00016c: movw r8, #65529 @ 0xfff9 >> 0xbf000170: movt r8, #65535 @ 0xffff >> 0xbf000174: movw r9, #65535 @ 0xffff >> 0xbf000178: movt r9, #65535 @ 0xffff >> 0xbf00017c: adds r2, r2, r8 >> 0xbf000180: adc r3, r3, r9 >> 0xbf000184: mov r6, #5 >> 0xbf000188: mov r7, #0 >> 0xbf00018c: strd r6, [r11, #-8] >> 0xbf000190: ldrd r6, [r11, #-16] > > Up to this point, it looks correct. r2/r3 contain the stack pointer > which corresponds to the instruction at "2:" > >> 0xbf000194: lsl r2, r2, #24 >> 0xbf000198: asr r2, r2, #24 >> 0xbf00019c: str r2, [r11, #-16] > > This then narrows the 64-bit pointer down to just 8!!! bits, but this > is what the instruction at "4:" is asking for. However, it looks like > it's happening to BPF's "r1" rather than "r3" and this is probably > where the problem lies. > > I haven't got time to analyse this further this morning - I'm only > around sporadically today. I'll try to look deeper at this later on. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! I found the problem. The implementation of Sign extended move is broken, it clobbers the source register. I have sent a patch to fix it and also fixed another issue that I saw: https://lore.kernel.org/bpf/20240409095038.26356-1-puranjay@xxxxxxxxxx/ I have manually tested with the reproducer but let's try to rerun the reproducer through syzbot: #syz test: https://github.com/puranjaymohan/linux.git arm32_movsx_fix