Re: [syzbot] [mm?] BUG: unable to handle kernel paging request in copy_from_kernel_nofault (2)

"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> · Fri, 5 Apr 2024 19:19:10 +0100

On Fri, Apr 05, 2024 at 10:50:30AM -0700, Andrii Nakryiko wrote:
> On Fri, Apr 5, 2024 at 9:30 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Apr 5, 2024 at 4:36 AM Russell King (Oracle)
> > <linux@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Apr 05, 2024 at 12:02:36PM +0100, Mark Rutland wrote:
> > > > On Thu, Apr 04, 2024 at 03:57:04PM -0700, Alexei Starovoitov wrote:
> > > > > On Wed, Apr 3, 2024 at 6:56 PM Andrew Morton <akpm@linux-foundationorg> wrote:
> > > > > >
> > > > > > On Mon, 01 Apr 2024 22:19:25 -0700 syzbot <syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > >
> > > > > > Thanks.  Cc: bpf@xxxxxxxxxxxxxxx
> > > > >
> > > > > I suspect the issue is not on bpf side.
> > > > > Looks like the bug is somewhere in arm32 bits.
> > > > > copy_from_kernel_nofault() is called from lots of places.
> > > > > bpf is just one user that is easy for syzbot to fuzz.
> > > > > Interestingly arm defines copy_from_kernel_nofault_allowed()
> > > > > that should have filtered out user addresses.
> > > > > In this case ffffffe9 is probably a kernel address?
> > > >
> > > > It's at the end of the kernel range, and it's ERR_PTR(-EINVAL).
> > > >
> > > > 0xffffffe9 is -0x16, which is -22, which is -EINVAL.
> > > >
> > > > > But the kernel is doing a write?
> > > > > Which makes no sense, since copy_from_kernel_nofault is probe reading.
> > > >
> > > > It makes perfect sense; the read from 'src' happened, then the kernel tries to
> > > > write the result to 'dst', and that aligns with the disassembly in the report
> > > > below, which I beleive is:
> > > >
> > > >      8: e4942000        ldr     r2, [r4], #0  <-- Read of 'src', fault fixup is elsewhere
> > > >      c: e3530000        cmp     r3, #0
> > > >   * 10: e5852000        str     r2, [r5]      <-- Write to 'dst'
> > > >
> > > > As above, it looks like 'dst' is ERR_PTR(-EINVAL).
> > > >
> > > > Are you certain that BPF is passing a sane value for 'dst'? Where does that
> > > > come from in the first place?
> > >
> > > It looks to me like it gets passed in from the BPF program, and the
> > > "type" for the argument is set to ARG_PTR_TO_UNINIT_MEM. What that
> > > means for validation purposes, I've no idea, I'm not a BPF hacker.
> > >
> > > Obviously, if BPF is allowing copy_from_kernel_nofault() to be passed
> > > an arbitary destination address, that would be a huge security hole.
> >
> > If that's the case that's indeed a giant security hole,
> > but I doubt it. We would be crashing other archs as well.
> > I cannot really tell whether arm32 JIT is on.
> > If it is, it's likely a bug there.
> > Puranjay,
> > could you please take a look.
> >
> 
> I dumped the BPF program that repro.c is loading, it works on x86-64
> and there is nothing special there. We are probe-reading 5 bytes from
> somewhere into the stack. Everything is unaligned here, but stays
> within a well-defined memory slot.
> 
> Note the r3 = (s8)r1, that's a new-ish thing, maybe bug is somewhere
> there (but then it would be JIT, not verifier itself)
> 
>    0: (7a) *(u64 *)(r10 -8) = 896542069
>    1: (bf) r1 = r10
>    2: (07) r1 += -7
>    3: (b7) r2 = 5
>    4: (bf) r3 = (s8)r1
>    5: (85) call bpf_probe_read_kernel#-72390

Before jumping to conclusions, let's try to unravel what's going on
here.

We're calling bpf_probe_read_kernel(), and the arguments to this are:

	void *dst		; r1
	u32 size		; r2
	const void *unsafe_ptr	; r3

The problem that has been reported is that the _store_ in
copy_from_kernel_nofault(). Thus it's the destination pointer that's
the proble, and thus that is the value that ends up in r1.

What we can also see in the dump is that the address being read from
is the same as the address being written, and these are both
0xffffffe9 or -22. This would mean that both r3 and r1 contain the
same value.

Unwinding the code further, r1 comes from r10 - 7. So r10 probably
was -15.

Neither of these are valid stack addresses on 32-bit ARM.

Now, to repeat the same question. Is the BPF JIT on for this test?
This is a crucial piece of information, because it tells us whether
we need to look at the JIT or whether there's a problem with the
BPF interpreter. Please answer this question.

The next question to BPF people is... what is r10? Is that supposed
to be the read-only frame pointer? If so, why is it called r10 and
not something more readable? I'm guessing that the definition is
BPF_REG_FP, but I'm grasping at straws here (BPF people, fix your
debug so people who don't know BPF inside out can understand it!)

If the BPF JIT is being used, I think the next thing which needs to
happen is that the BPF JIT debug needs to be enabled. If
/proc/sys/net/core/bpf_jit_enable contains a value greater than 1,
then the ARM assembly will be hexdumped. One of the annoying things
is going to be piecing the hexdump together, converting it into a
form that can then be turned into a binary file, to then be
disassembled by objdump.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!