Re: Kernel crash in free_pipe_info()

Simon Brewer <sbrunau@xxxxxxxxx> · Fri, 10 Nov 2017 17:07:50 +1100

On 1 November 2017 at 14:19, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>>>
>>> 1. The faulty addresses are all near 0000000100000000, with one exception
>>> of null (which is the most recent one)
>>
>> Well, they're at 8(%rax), except for that last case.
>>
>> And in every case (_including_ that last case), %rax has a very
>> interesting pattern.. That's the (bad) buf->ops pointer that  was
>> loaded from the somehow corrupted "buf".
>>
>> The values in all cases are
>>
>> 00000000fffffffa
>> 00000000fffffffd
>> 00000000fffffff1
>> 00000000fffffff7
>> 00000000fffffff4
>> 00000000fffffffa
>> 00000000fffffffd
>> 00000000fffffffd
>> 00000000fffffffa
>> 00000000ffffffe8
>> 00000000fffffff1
>> 00000000fffffff7
>>
>> which kind of looks like a 32-bit error value. So we have (n, val, (errno)):
>>
>>       1 -24 (EMFILE)
>>       2 -15 (ENOTBLK)
>>       1 -12 (ENOMEM)
>>       2 -9 (EBADF)
>>       3 -6 (ENXIO)
>>       3 -3 (ESRCH)
>>
>> none of which makes any sense to me, but it's an interesting pattern
>> nonetheless.
>
>
> Yeah, good find!
>
>
>>
>>> 2. R12 register, which should map to the local vairable 'i', is always 0x8
>>> at the time of crash.
>>
>> So _if_ this is some kind of use-after-free thing, and the allocation
>> got re-used for something else, that might just be related to whatever
>> ends up being the offset that is filled in with the (int) error
>> number.
>>
>> Except the offset is that %r12*0x28+0x10, so we're talking a byte
>> offset of 330 bytes into the allocation, and apparently the eight
>> previous (0-7) iterations were fine.
>>
>> Which is really odd.
>>
>> I'm not seeing anything that makes sense. I'll have to think about this.
>>
>> I'm assuming you don't have slub debugging enabled, and no way to
>> enable it and try to catch this?
>
> We enable it at compile-time but not at run-time:
>
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> CONFIG_SLUB_CPU_PARTIAL=y
> # CONFIG_SLUB_DEBUG_ON is not set
> # CONFIG_SLUB_STATS is not set
>
> I can try to manually add slub_debug in boot parameters, but still
> have no idea how and when can trigger this bug again.
>
>
> Thanks!

This looks familiar...

https://github.com/moby/moby/issues/34472

>From the bug report:
"In particular, it looks like either docker-containerd or
docker-containerd-shim (the log is cut off) has a pipe open that is
causing a kernel BUG when attempting to kill the process. Fun times."