Re: BUG: KASAN: stack-out-of-bounds

Christophe Leroy <christophe.leroy@xxxxxx> · Thu, 28 Feb 2019 14:41:12 +0100

Le 28/02/2019 à 10:47, Andrey Ryabinin a écrit :

On 2/28/19 12:27 PM, Dmitry Vyukov wrote:
On Thu, Feb 28, 2019 at 10:22 AM Andrey Ryabinin
<aryabinin@xxxxxxxxxxxxx> wrote:

On 2/27/19 4:11 PM, Christophe Leroy wrote:

Le 27/02/2019 à 10:19, Andrey Ryabinin a écrit :

On 2/27/19 11:25 AM, Christophe Leroy wrote:
With version v8 of the series implementing KASAN on 32 bits powerpc (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=94309), I'm now able to activate KASAN on a mac99 is QEMU.

Then I get the following reports at startup. Which of the two reports I get seems to depend on the option used to build the kernel, but for a given kernel I always get the same report.

Is that a real bug, in which case how could I spot it ? Or is it something wrong in my implementation of KASAN ?

I checked that after kasan_init(), the entire shadow memory is full of 0 only.

I also made a try with the strong STACK_PROTECTOR compiled in, but no difference and nothing detected by the stack protector.

==================================================================
BUG: KASAN: stack-out-of-bounds in memchr+0x24/0x74
Read of size 1 at addr c0ecdd40 by task swapper/0

CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7+ #1133
Call Trace:
[c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
[c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
[c0e9dd10] [c089579c] memchr+0x24/0x74
[c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
[c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
[c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
--- interrupt: c0e9df00 at 0x400f330
      LR = init_stack+0x1f00/0x2000
[c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
[c0e9df20] [c0c28e44] early_irq_init+0x38/0x108
[c0e9df50] [c0c16434] start_kernel+0x310/0x488
[c0e9dff0] [00003484] 0x3484

The buggy address belongs to the variable:
   __log_buf+0xec0/0x4020
The buggy address belongs to the page:
page:c6eac9a0 count:1 mapcount:0 mapping:00000000 index:0x0
flags: 0x1000(reserved)
raw: 00001000 c6eac9a4 c6eac9a4 00000000 00000000 00000000 ffffffff 00000001
page dumped because: kasan: bad access detected

Memory state around the buggy address:
   c0ecdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0ecdc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0ecdd00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
                                     ^
   c0ecdd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
   c0ecde00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

This one doesn't look good. Notice that it says stack-out-of-bounds, but at the same time there is
     "The buggy address belongs to the variable:  __log_buf+0xec0/0x4020"
   which is printed by following code:
     if (kernel_or_module_addr(addr) && !init_task_stack_addr(addr)) {
         pr_err("The buggy address belongs to the variable:\n");
         pr_err(" %pS\n", addr);
     }

So the stack unrelated address got stack-related poisoning. This could be a stack overflow, did you increase THREAD_SHIFT?
KASAN with stack instrumentation significantly increases stack usage.

I get the above with THREAD_SHIFT set to 13 (default value).
If increasing it to 14, I get the following instead. That means that in that case the problem arises a lot earlier in the boot process (but still after the final kasan shadow setup).

We usually use 15 (with 4k pages), but I think 14 should be enough for the clean boot.

==================================================================
BUG: KASAN: stack-out-of-bounds in pmac_nvram_init+0x1f8/0x5d0
Read of size 1 at addr f6f37de0 by task swapper/0

CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7+ #1143
Call Trace:
[c0e9fd60] [c01c43c0] print_address_description+0x164/0x2bc (unreliable)
[c0e9fd90] [c01c46a4] kasan_report+0xfc/0x180
[c0e9fdd0] [c0c226d4] pmac_nvram_init+0x1f8/0x5d0
[c0e9fef0] [c0c1f73c] pmac_setup_arch+0x298/0x314
[c0e9ff20] [c0c1ac40] setup_arch+0x250/0x268
[c0e9ff50] [c0c151dc] start_kernel+0xb8/0x488
[c0e9fff0] [00003484] 0x3484

Memory state around the buggy address:
  f6f37c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  f6f37d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f6f37d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
                                                ^
  f6f37e00: 00 00 01 f4 f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2
  f6f37e80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
==================================================================

Powerpc's show_stack() prints stack addresses, so we know that stack is something near 0xc0e9f... address.
f6f37de0 is definitely not stack address and it's to far for the stack overflow.
So it looks like shadow for stack  - kasan_mem_to_shadow(0xc0e9f...) and shadow for address in report - kasan_mem_to_shadow(0xf6f37de0)
point to the same physical page.

Shouldn't shadow start at 0xf8 for powerpc32? I did some math
yesterday which I think lead me to 0xf8.

Dunno, maybe. How is this relevant? In case you referring to the 0xf6f* addresses in the report,
these are not shadow, but accessed addresses.

Thanks for your help. Indeed you made me realise here that the access is 
to an IO Mapping, so being covered by the zero shadow page.

After some investigation I saw that the zero shadow page was being 
poisonned allthough i confirmed it was mapped RO in every page table 
entry referencing it.

What I finaly discovered is that in fact the HW still had some of the 
early page table entries pointing to the zero page in RW.

The reason for the above is due to the PGD having multiple entries 
pointing to kasan_early_shadow_pte[]. In powerpc hash32, a flag 
_PAGE_HASHPTE is set to tell when a PTE has been given to HW. Then when 
flush_tlb_kernel_range() is called, the kernel walks the page tables and 
only really flushes the pages having the _PAGE_HASHPTE flag, then clear it.
The consequence is that when the kernel walk again that PTE from a 
different PGD entry, it is seen as not needing flush anymore.

So, the conclusion to this that I'm finalising at the moment is to have 
the final shadow page table layout set up as soon as memblock is 
available and before switching from the early hash table to the final 
hash table.

Christophe

This allows to cover at most 1GB of memory. Do you have more by any chance?