Re: Dell T7500 won’t boot starting at 4.19.13 for fedora 29

stan <stanl-fedorauser@xxxxxxxxxxx> · Tue, 22 Jan 2019 08:43:24 -0700

On Mon, 21 Jan 2019 18:48:04 -0500
Nate Pearlstein <darknater@xxxxxxxxx> wrote:

> I normally run w/o quiet and rhgb anyway.  I added earlyprintk=vga
> and it’s clear the system panics early.  I tried adding
> boot_delay=500 and also boot_delay=10 to try to capture the spew with
> my phone camera capturing at 60fps.  Only leaving off boot_delay can
> I see the panic but the output is coming faster than 60fps.
> 
> From what I can piece together without using a serial console and
> capturing from another host:
> 
> kernel BUG at mm/page_alloc.c:791!
> Invalid opcode: 0000 [#10 SMP PTI] (not sure about this too jumbled)
> I can’t really see the stack trace either
> __free_page_ok
> free_all_bootmem
> mem_init
> start_kernel
> secondary_startup_64
> [1.860030] free_one_page RIP: 0010:free_one_page
> [1.863221] Code: 08 0e 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff
> e8 56 02 00 e9 9c fb ff ff 48 c7 c6 08 86 0d 92 4c 89 f7 e8 e2 0d 03
> 00 <0f> ob 48 c6 30 86 0d 92 48 89 df e8 d1 0d 03 00 0f 0b 31 d2 e9
> [1.872806] RSP: 0000:ffffffff92203e20 EFLAGS: 00010046
> .
> .
> [1.923827] Kernel panic - not syncing

Samuel might be able to decipher this, but I have an off the wall idea.
Kernels get bigger with each release.  I wonder if there is a memory
problem, that the earlier kernels don't trigger, but the larger kernels
do.  Run a memory test?

The other thing to try is re-installing the kernel.  A really long
shot, but worth a try.

And maybe it is a kernel bug.  The line you are referring to is
	VM_BUG_ON_PAGE(bad_range(zone, page), page);
and it occurs when trying to deallocate a page.

static inline void __free_one_page(struct page *page,
		unsigned long pfn,
		struct zone *zone, unsigned int order,
		int migratetype)
{

I interpret the errors as saying that the kernel is trying to
deallocate a page, and the CPU receives a 0000 opcode.  That would be
an error.  But is it coming from the kernel, or is the kernel reading a
bad location?

I think it has to be something about your hardware, because if the
kernel was actually having trouble deallocating pages for all boots,
this would be a well known problem.  Maybe you have hit a corner case.
You could open a bugzilla, but it will be difficult for someone to fix
this without your hardware to replicate the crash or the complete crash
output.

The 4.20 kernel series is not far away from coming to stable.  You
could either grab one from koji,
https://koji.fedoraproject.org/koji/packageinfo?packageID=8
or use an older kernel until it is released.  It might fix the issue as
a side effect of other changes.
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx