Re: Instability in current -git tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 13, 2018 at 10:40 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Jul 13, 2018 at 5:47 PM Pavel Tatashin
> <pasha.tatashin@xxxxxxxxxx> wrote:
> >
> > The commit intends to zero memmap (struct pages) for every hole in
> > e820 ranges by marking them reserved in memblock. Later
> > zero_resv_unavail() walks through memmap ranges and zeroes struct
> > pages for every page that is reserved, but does not have a physical
> > backing known by kernel.
>
> Ahh. That just looks incoredibly buggy.
>
> You can't just memset() the 'struct page' to zero after it's been set up.

That should not be happening, unless there is a bug.
zero_resv_unavail() is supposed be zeroing struct pages that were not
setup.

Struct pages are configured here:

free_area_init_nodes()
  free_area_init_node()
    free_area_init_core()
      memmap_init()
        memmap_init_zone()
         if (pfn_valid(pfn))   <--- Actually few more checks:
early_pfn_valid(pfn), early_pfn_in_nid(pfn, nid)
          __init_single_page(pfn_to_page(pfn))
            mm_zero_struct_page(page);
            set the other fields in struct page
  zero_resv_unavail(); <-- is called at the end of free_area_init_nodes()
        if (!pfn_valid(pfn))
            mm_zero_struct_page(pfn_to_page(pfn)); <- So the idea that
we zero only the part of memmap that
            was not initialized in __init_single_page().

We want to zero those struct pages so we do not have uninitialized
data accessed by various parts of the code that rounds down large
pages and access the first page in section without verifying that the
page is valid. The example of this is described in commit that
introduced zero_resv_unavail()

>
> That also zeroes page->flags, but page->flags contains things like the
> zone and node ID.
>
> That would explain the completely bogus "DMA" zone. That's not the
> real zone, it's just that page_zonenr() returns 0 because of an
> incorrect clearing of page->flags.
>
> And it would also completely bugger pfn_to_page() for
> CONFIG_DISCONTIGMEM, because the way that works is that it looks up
> the node using page_to_nid(), and then looks up the pfn by using the
> per-node pglist_data ->node_mem_map (that the 'struct page' is
> supposed to be a pointer into).
>
> So zerong the page->flags after it has been set up is completely wrong
> as far as I can see. It literally invalidates 'struct page' and makes
> various core VM function assumptions stop working.
>
> As an example, it makes "page_to_pfn -> pfn_to_page" not be the
> identity transformation, which could result in totally random
> behavior.
>
> And it definitely explains the whole "oh, now the zone ID doesn't
> match" issue. It &*should* have been zone #1 ("DMA32"), but it got
> cleared and that made it zone #0 ("DMA").
>
> So yeah, this looks like the cause of it. And it could result in any
> number of really odd problems, so this could easily explain the syzbot
> failures and reboots at bootup too. Who knows what happens when
> pfn_to_page() doesn't work any more.
>
> Should we perhaps just revert
>
>   124049decbb1 x86/e820: put !E820_TYPE_RAM regions into memblock.reserved
>   f7f99100d8d9 mm: stop zeroing memory during allocation in vmemmap
>
> it still reverts fairly cleanly (there's a trivial conflict with the
> older commit).
>
>               Linus
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux