On 28.09.19 11:06, David Hildenbrand wrote: > On 28.09.19 00:17, Alexander Duyck wrote: >> On Fri, Sep 27, 2019 at 2:59 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> On Fri, 27 Sep 2019 17:28:06 -0400 Qian Cai <cai@xxxxxx> wrote: >>> >>>>> >>>>> So I think you've moved the arch_free_page() to be after the final >>>>> thing which can access page contents, yes? If so, we should have a >>>>> comment in free_pages_prepare() to attmept to prevent this problem from >>>>> reoccurring as the code evolves? >>>> >>>> Right, something like this above arch_free_page() there? >>>> >>>> /* >>>> * It needs to be just above kernel_map_pages(), as s390 could mark those >>>> * pages unused and then trigger a fault when accessing. >>>> */ >>> >>> I did this. >>> >>> --- a/mm/page_alloc.c~mm-page_alloc-fix-a-crash-in-free_pages_prepare-fix >>> +++ a/mm/page_alloc.c >>> @@ -1179,7 +1179,13 @@ static __always_inline bool free_pages_p >>> kernel_init_free_pages(page, 1 << order); >>> >>> kernel_poison_pages(page, 1 << order, 0); >>> + /* >>> + * arch_free_page() can make the page's contents inaccessible. s390 >>> + * does this. So nothing which can access the page's contents should >>> + * happen after this. >>> + */ >>> arch_free_page(page, order); >>> + >>> if (debug_pagealloc_enabled()) >>> kernel_map_pages(page, 1 << order, 0); >>> >> >> So the question I would have is what is the state of the page after it >> has been marked unused and then pulled back in? I'm assuming it will >> be all 0s. > > I think this comment relates to the s390x implementation, so I'll try to > explain that. After arch_free_page() the page might have been zapped in > the hypervisor, but that might happen deferred. The guest ends up > triggering the ESSA instruction in arch_free_page(). That instruction > sets some extended-page-table-related ("pgste") bits in the hypervisor > tables for the guest ("gmap") and fills a buffer with these entries. The > page is marked _PGSTE_GPS_USAGE_UNUSED. Yes. And that also means that architecturally it can be 0 or it can contain the old content depending on whether the host has paged that page out or not and how many pages have been marked unused. I am also sure that the implementation of z/VM and KVM do differ in that regard. For example KVM does not make use of the logical zero state but z/VM does. In essence you can consider this like a ballooner that takes away the page lazily. For a writeup of the details see https://www.kernel.org/doc/ols/2006/ols2006v2-pages-321-336.pdf (This also contains additional states that were never merged upstream)