Re: Kernel crash in free_pipe_info()

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 30 Oct 2017 19:08:46 -0700

On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> 1. The faulty addresses are all near 0000000100000000, with one exception
> of null (which is the most recent one)

Well, they're at 8(%rax), except for that last case.

And in every case (_including_ that last case), %rax has a very
interesting pattern.. That's the (bad) buf->ops pointer that  was
loaded from the somehow corrupted "buf".

The values in all cases are

00000000fffffffa
00000000fffffffd
00000000fffffff1
00000000fffffff7
00000000fffffff4
00000000fffffffa
00000000fffffffd
00000000fffffffd
00000000fffffffa
00000000ffffffe8
00000000fffffff1
00000000fffffff7

which kind of looks like a 32-bit error value. So we have (n, val, (errno)):

      1 -24 (EMFILE)
      2 -15 (ENOTBLK)
      1 -12 (ENOMEM)
      2 -9 (EBADF)
      3 -6 (ENXIO)
      3 -3 (ESRCH)

none of which makes any sense to me, but it's an interesting pattern
nonetheless.

> 2. R12 register, which should map to the local vairable 'i', is always 0x8
> at the time of crash.

So _if_ this is some kind of use-after-free thing, and the allocation
got re-used for something else, that might just be related to whatever
ends up being the offset that is filled in with the (int) error
number.

Except the offset is that %r12*0x28+0x10, so we're talking a byte
offset of 330 bytes into the allocation, and apparently the eight
previous (0-7) iterations were fine.

Which is really odd.

I'm not seeing anything that makes sense. I'll have to think about this.

I'm assuming you don't have slub debugging enabled, and no way to
enable it and try to catch this?

                   Linus