On Wed, Mar 06, 2019 at 10:14:47AM +0000, Guillaume Tucker wrote: > On 01/03/2019 23:23, Dan Williams wrote: > > On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker > > <guillaume.tucker@xxxxxxxxxxxxx> wrote: > > > > Is there an early-printk facility that can be turned on to see how far > > we get in the boot? > > Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and > earlyprintk in the command line. Here's the result, with the > commit cherry picked on top of next-20190304: > > https://lava.collabora.co.uk/scheduler/job/1526326 > > [ 1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022 > [ 1.396718] Unable to handle kernel paging request at virtual address 77bb4003 > [ 1.404203] pgd = (ptrval) > [ 1.406971] [77bb4003] *pgd=00000000 > [ 1.410650] Internal error: Oops: 5 [#1] ARM > [...] > [ 1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80) > [ 1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4) > > It's always failing at that point in the code. Also when > enabling "debug" on the kernel command line, the issue goes > away (exact same binaries etc..): > > https://lava.collabora.co.uk/scheduler/job/1526327 > > For the record, here's the branch I've been using: > > https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug > > The board otherwise boots fine with next-20190304 (SMP=n), and > also with the patch applied but the shuffle configs set to n. > > > Were there any boot *successes* on ARM with shuffling enabled? I.e. > > clues about what's different about the specific memory setup for > > beagle-bone-black. > > Looking at the KernelCI results from next-20190215, it looks like > only the BeagleBone Black with SMP=n failed to boot: > > https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/ > > Of course that's not all the ARM boards that exist out there, but > it's a fairly large coverage already. > > As the kernel panic always seems to originate in ti-sysc.c, > there's a chance it's only visible on that platform... I'm doing > a KernelCI run now with my test branch to double check that, > it'll take a few hours so I'll send an update later if I get > anything useful out of it. > > In the meantime, I'm happy to try out other things with more > debug configs turned on or any potential fixes someone might > have. ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the failure has something to do with it... Guillaume, can you try this patch: diff --git a/mm/shuffle.c b/mm/shuffle.c index 3ce1248..4a04aac 100644 --- a/mm/shuffle.c +++ b/mm/shuffle.c @@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400); * For two pages to be swapped in the shuffle, they must be free (on a * 'free_area' lru), have the same order, and have the same migratetype. */ -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order, + struct zone *z) { struct page *page; @@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) if (!PageBuddy(page)) return NULL; + if (!memmap_valid_within(pfn, page, z)) + return NULL; + /* * ...is the page on the same list as the page we will * shuffle it with? @@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z) * page_j randomly selected in the span @zone_start_pfn to * @spanned_pages. */ - page_i = shuffle_valid_page(i, order); + page_i = shuffle_valid_page(i, order, z); if (!page_i) continue; @@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z) j = z->zone_start_pfn + ALIGN_DOWN(get_random_long() % z->spanned_pages, order_pages); - page_j = shuffle_valid_page(j, order); + page_j = shuffle_valid_page(j, order, z); if (page_j && page_j != page_i) break; } -- Sincerely yours, Mike.