On Mon, Jul 03, 2023 at 09:03:38AM +0200, Mirsad Goran Todorovac wrote: > On 3.7.2023. 7:41, Kees Cook wrote: > > On Mon, Jul 03, 2023 at 07:18:57AM +0200, Mirsad Goran Todorovac wrote: > > > I apologise for confusion. In fact, I have cloned the Torvalds tree after > > > 6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1 > > > from the stable branch as the Subject line might have misled. > > > > Thanks, no worries! I got myself confused too. :) > > > > The config you sent looks like I'd expect now too. Questions for you, if > > you have time to diagnose further: > > > > - Are you able to catch the very beginning of the crash, where the Oops > > starts? > > It scrolls up very quickly. Couldn't catch that with the camera. > > > - Does pstore work for you to catch the crash? > > Haven't tried that yet. I will have to do some homework. Try adding this to the .config: # Enable PSTORE support CONFIG_PSTORE=y CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240 CONFIG_PSTORE_COMPRESS=y CONFIG_PSTORE_DEFLATE_COMPRESS=y # Enable UEFI pstore backend CONFIG_EFI_VARS_PSTORE=y # CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set # Enable ACPI ERST pstore backend CONFIG_ACPI=y CONFIG_ACPI_APEI=y A go write-up about using it is here: https://blogs.oracle.com/linux/post/pstore-linux-kernel-persistent-storage-file-system and covers the systemd-pstore details too. Note that in the config I suggested, I've enabled the efi backend by default. > > - Can you try booting with this patch applied? > > https://lore.kernel.org/lkml/20230629190900.never.787-kees@xxxxxxxxxx/ > > Sure, but after 4 PM UTC+02 I suppose. Cool. xhci-hub is in your backtrace, and the above patch was made for something very similar (though, again, I don't see why you're getting a _crash_, it should _warn_ and continue normally). And, actually, also include this patch: https://lore.kernel.org/lkml/20230614181307.gonna.256-kees@xxxxxxxxxx/ > > I'll try to see if I can figure out anything more from the images you > > posted. Yeah, the xhci-hub bit is the only clue I can see here. It's also in the IRQ handler, which reminds me of this bug that we still don't have a root-cause for the _crash_ during the warning here: https://lore.kernel.org/oe-lkp/202306131354.A499DE60@keescook/ but I the new patch I linked to above fixes the source of the warning. > I really couldn't figure out myself what went wrong with this one? Having the crash scroll off the page is pretty frustrating. I wonder if the kernel crash handler could changed to repeat the RIP at the end of the crash... -Kees -- Kees Cook