On Tue, Aug 25, 2020 at 03:58:27PM -0700, Doug Berger wrote: > I recently tracked down a problem I observed when booting a v5.4 kernel > on a sparsemem UMA arm platform which includes a no-map reserved-memory > region in the middle of its HighMem zone. > > When memmap_init_zone() is invoked the pfn's that correspond to the > no-map region fail the early_pfn_valid() check and the struct page > structures are not initialized creating a "hole" in the memmap. Later in > my boot sequence the sock_init() initcall leads to a bpf_prog_alloc() > which ends up stealing a page from the block containing the no-map > region which then leads to a call of move_freepages_block() to > reclassify the migratetype of the entire block. > > The function move_freepages() includes a check of pfn_valid_within for > each page in the range, but since the arm architecture doesn't include > HOLES_IN_ZONE this check is optimized out and the uninitialized struct > page is accessed. Specifically, PageLRU() calls compound_head() on the > page and if the page->compound_head value is odd the value is used as a > pointer to the head struct page. For uninitialized memory there is a > high chance that a random value of compound head will be odd and contain > an invalid pointer value that causes the kernel to abort and panic. > > As you might imagine specifying HOLES_IN_ZONE for the arm build allows > pfn_valid_within to protect against accessing the uninitialized struct > page. However, the performance penalty this incurs seems unnecessary. > > Commit 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations") as > part of the "sparse_init rewrite" series introduced in v4.19 changed the > way sparsemem memmaps are initialized. Prior to this patch the sparsemem > memmaps are initialized to all 0's. I observed that on older kernels the > "uninitialized" struct page access also occurs, but the 0 > page->compound_head indicates no compound head and the page pointer is > therefore not corrupted. The other logic ends up causing the page to be > skipped and everything "happens to work". > > While considering solutions to this issue I observed that the problem > does not occur in the current upstream as a result of a combination of > other commits. The following commits provided functionality to > initialize struct page structures for pages that are unavailable like > the no-map region in my system: > commit a4a3ede2132a ("mm: zero reserved and unavailable struct pages") > commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages") > commit ec393a0f014e ("mm: return zero_resv_unavail optimization") > commit e822969cab48 ("mm/page_alloc.c: fix uninitialized memmaps on a > partially populated last section") > commit 4b094b7851bf ("mm/page_alloc.c: initialize memmap of unavailable > memory directly") > > However, those commits added the functionality to the free_area_init() > and free_area_init_nodes() functions and the non-NUMA arm architecture > did not begin calling free_area_init() until the following commit in v5.8: > commit a32c1c61212d ("arm: simplify detection of memory zone boundaries") > > Prior to that commit the non-NUMA arm architecture called > free_area_init_node() directly at the end of zone_sizes_init(). > > So while the problem appears to be fixed upstream by commit a32c1c61212d > ("arm: simplify detection of memory zone boundaries") it is still > present in stable branches between v4.19.y and v5.7.y inclusive and > probably for architectures other than arm as well that didn't call > free_area_init(). This upstream commit is not easily/safely backportable > to stable branches, but if we focus on the sliver of functionality that > adds the initialization code from free_area_init() to the > zones_sizes_init() function used by non-NUMA arm kernels I believe a > simple patch could be developed for each relevant stable branch to > resolve the issue I am observing. Similar patches could also be applied > for other architectures that now call free_area_init() upstream but not > in one of these stable branches, but I am not in a position to test > those architectures. > > For the linux-5.4.y branch such a patch might look like this: > >From 671c341b5cdb8360349c33ade43115e28ca56a8a Mon Sep 17 00:00:00 2001 > From: Doug Berger <opendmb@xxxxxxxxx> > Date: Tue, 25 Aug 2020 14:39:43 -0700 > Subject: [PATCH] ARM: mm: sync zone_sizes_init with free_area_init > > The arm architecture does not invoke the common function > free_area_init(). Instead for non-NUMA builds it invokes > free_area_init_node() directly from zone_sizes_init(). > > As a result recent changes in free_area_init() are not > picked up by arm architecture builds. > > This commit adds the updates to the zone_sizes_init() > function to achieve parity with the free_area_init() > functionality. > > Fixes: 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations") > Signed-off-by: Doug Berger <opendmb@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > arch/arm/mm/init.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > index 6f19ba53fd1f..4f171d834c60 100644 > --- a/arch/arm/mm/init.c > +++ b/arch/arm/mm/init.c > @@ -169,6 +169,7 @@ static void __init zone_sizes_init(unsigned long > min, unsigned long max_low, > arm_dma_zone_size >> PAGE_SHIFT); > #endif > > + zero_resv_unavail(); > free_area_init_node(0, zone_size, min, zhole_size); > } > > -- > 2.7.4 > > I am unclear of the mechanics for submitting such a stable patch when it > represents a perhaps less than obvious sliver of the upstream commit > that fixes the issue, so I am soliciting guidance with this email. > > Thank you for taking the time to read this far, and please let me know > how I can improve the situation, I'm confused, what exactly are you asking for here? Is there a specific commit you wish to see backported to the 5.4 kernel, or are you trying to say you want something that is not in Linus's tree, backported instead? thanks, greg k-h