On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote: > Hello everyone, Hi, > While backporting i915 fixes to the RHEL9 kernel for a similar looking > issue; I noticed the commits that worked for RHEL8 did not work for RHEL9. > > Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces > on RHEL9. [1] being the most common one and it repeats itself on suspend. A warn-once might help to reduce them so they can be worked on one by one. > [2] was the second one to show and seems to be the second most common > call trace. This was tested on a Framework Alder Lake laptop with i915 > graphics. There was a total of 36 call traces before suspend and > additional 12 after suspend (once again, [1]). > > When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not > have a way to pull the information and was transcribed manually. [3] > > [1] Both: > BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell > preempt_count: 0, expected: 0 > RCU nest depth: 6, expected: 0 > 12 locks held by gnome-shell/6590: … > BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell > preempt_count: 0, expected: 0 > RCU nest depth: 5, expected: 0 are might-sleep splats. I don't see these on my notebook/desktop on 6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on my desktop for testing. It looks like due to "locks" the RCU is > 0 and then the splat triggers because it assumes that it will schedule-out which is okay on RT. But then it is not okay for the ww-mutex to do so I am a little confused if this is RT only problem or also not RT. But maybe it is just a try-lock and the warning is just wrongly triggered… > [3] > > general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI > KASAM: null-ptr-deref in range [0x000...20-0x000...27] > RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87) > [snipped] > PKRU: 5555554 > Call Trace: > <TASK> > usci_destroy+0xe/0x20 > ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207) This is odd. That means that ucsi_register() failed and debugfs was setup and is NULL. And check in line 87 checks ucsi which is non-NULL and the ucsi->debugfs is NULL. So it should return but somehow it does this. Does this also trigger without KASAN? In the meantime let me try to enable KASAN… Sebastian