Re: KASAN vs. boot-time switching between 4- and 5-level paging

Andy Lutomirski <luto@xxxxxxxxxx> · Tue, 11 Jul 2017 08:06:57 -0700

On Tue, Jul 11, 2017 at 3:35 AM, Kirill A. Shutemov
<kirill@xxxxxxxxxxxxx> wrote:
> On Mon, Jul 10, 2017 at 05:30:38PM -0700, Andy Lutomirski wrote:
>> On Mon, Jul 10, 2017 at 2:24 PM, Kirill A. Shutemov
>> <kirill@xxxxxxxxxxxxx> wrote:
>> > On Mon, Jul 10, 2017 at 01:07:13PM -0700, Andy Lutomirski wrote:
>> >> Can you give the disassembly of the backtrace lines?  Blaming the
>> >> .endr doesn't make much sense to me.
>> >
>> > I don't have backtrace. It's before printk() is functional. I only see
>> > triple fault and reboot.
>> >
>> > I had to rely on qemu tracing and gdb.
>>
>> Can you ask GDB or objtool to disassemble around those addresses?  Can
>> you also attach the big dump that QEMU throws out that shows register
>> state?  In particular, CR2, CR3, and CR4 could be useful.
>
> The last three execptions:
>
> check_exception old: 0xffffffff new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3036
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

So RSP was 0xffffffff80000000, a push happened, and we tried to write
to 0xffffffff7ffffff8, which failed.

> check_exception old: 0xe new 0xe, cr2: 0xffffffff7ffffff8, rip: 0xffffffff84bb3141
> RAX=00000000ffffffff RBX=ffffffff800000d8 RCX=ffffffff84be4021 RDX=dffffc0000000000
> RSI=0000000000000006 RDI=ffffffff84c57000 RBP=ffffffff800000c8 RSP=ffffffff80000000

And #PF doesn't use IST, so it double-faulted.

Either the stack isn't mapped in the page tables, RSP is corrupt, or
there's a genuine stack overflow here.