On Tue, Dec 10, 2024 at 01:37:11PM +0100, Greg Kroah-Hartman wrote: > On Tue, Dec 10, 2024 at 02:24:56PM +0200, Jani Nikula wrote: > > On Tue, 10 Dec 2024, Genes Lists <lists@xxxxxxxxxxxx> wrote: > > > On Tue, 2024-12-10 at 10:58 +0200, Jani Nikula wrote: > > >> On Tue, 10 Dec 2024, Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx> > > >> wrote: > > >> > Hi, > > >> > > > >> > > ... > > >> > > FYI 6.12.4 got a crash shortly after booting in dma_alloc_attrs - > > >> > > maybe > > >> > > triggered in ipu6_probe. Crash only happened on laptop with ipu6. > > >> > > All > > >> > > other machines are running fine. > > >> > > > >> > Have you read the dmesg further than the IPU6 related warning? The > > >> > IPU6 > > >> > driver won't work (maybe not even probe?) but if the system > > >> > crashes, it > > >> > appears unlikely the IPU6 drivers would have something to do with > > >> > that. > > >> > Look for warnings on linked list corruption later, they seem to be > > >> > coming > > >> > from the i915 driver. > > >> > > >> And the list corruption is actually happening in > > >> cpu_latency_qos_update_request(). I don't see any i915 changes in > > >> 6.12.4 > > >> that could cause it. > > >> > > >> I guess the question is, when did it work? Did 6.12.3 work? > > >> > > >> > > >> BR, > > >> Jani. > > > > > > > > > - 6.12.1 worked > > > > > > - mainline - works (but only with i915 patch set [1] otherwise there > > > are no graphics at all) > > > > > > [1] https://patchwork.freedesktop.org/series/141911/ > > > > > > - 6.12.3 - crashed (i see i915 not ipu6) and again it has > > > cpu_latency_qos_update_request+0x61/0xc0 > > > > Thanks for testing. > > > > There are no changes to either i915 or kernel/power between 6.12.1 and > > 6.12.4. > > > > There are some changes to drm core, but none that could explain this. > > > > Maybe try the same kernels a few more times to see if it's really > > deterministic? Not that I have obvious ideas where to go from there, but > > it's a clue nonetheless. > > 'git bisect' would be nice to run if possible... I've reproduced the issue. It's caused by 6.12.y commit: commit 6ac269abab9ca5ae910deb2d3ca54351c3467e99 Author: Bingbu Cao <bingbu.cao@xxxxxxxxx> Date: Wed Oct 16 15:53:01 2024 +0800 media: ipu6: not override the dma_ops of device in driver [ Upstream commit daabc5c64703432c4a8798421a3588c2c142c51b ] It makes alloc_fw_msg_bufs() fail on isys_probe() cpu_latency_qos_add_request(&isys->pm_qos, PM_QOS_DEFAULT_VALUE); ret = alloc_fw_msg_bufs(isys, 20); if (ret < 0) goto out_remove_pkg_dir_shared_buffer; And on error path we do not call cpu_latency_qos_remove_request() what cause pm_qos_request list corruption (it is memory use after free bug). The problem will disappear after applying: https://lore.kernel.org/stable/20241209175416.59433-1-stanislaw.gruszka@xxxxxxxxxxxxxxx/ since the allocation will not longer fail. But we also need to handle fail case correctly by adding cpu_latency_qos_remove_request() on error path. This requires mainline fix, I'll post it. Regards Stanislaw