Thomas Zimmermann <tzimmermann@xxxxxxx> writes: > Hi > > Am 27.11.23 um 17:25 schrieb Alyssa Ross: >> Thomas Zimmermann <tzimmermann@xxxxxxx> writes: >> >>> Invoke drm_plane_helper_funcs.end_fb_access before >>> drm_atomic_helper_commit_hw_done(). The latter function hands over >>> ownership of the plane state to the following commit, which might >>> free it. Releasing resources in end_fb_access then operates on undefined >>> state. This bug has been observed with non-blocking commits when they >>> are being queued up quickly. >>> >>> Here is an example stack trace from the bug report. The plane state has >>> been free'd already, so the pages for drm_gem_fb_vunmap() are gone. >>> >>> Unable to handle kernel paging request at virtual address 0000000100000049 >>> [...] >>> drm_gem_fb_vunmap+0x18/0x74 >>> drm_gem_end_shadow_fb_access+0x1c/0x2c >>> drm_atomic_helper_cleanup_planes+0x58/0xd8 >>> drm_atomic_helper_commit_tail+0x90/0xa0 >>> commit_tail+0x15c/0x188 >>> commit_work+0x14/0x20 >>> >>> For aborted commits, it is still ok to run end_fb_access as part of the >>> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes(). >>> >>> v2: >>> * fix test in drm_atomic_helper_cleanup_planes() >>> >>> Reported-by: Alyssa Ross <hi@xxxxxxxxx> >>> Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@xxxxxxxxx/ >>> Suggested-by: Daniel Vetter <daniel@xxxxxxxx> >>> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers") >>> Signed-off-by: Thomas Zimmermann <tzimmermann@xxxxxxx> >>> Cc: <stable@xxxxxxxxxxxxxxx> # v6.2+ >>> --- >>> drivers/gpu/drm/drm_atomic_helper.c | 17 +++++++++++++++++ >>> 1 file changed, 17 insertions(+) >> >> Got this basically immediately. :( > > I've never seen such problems on other systems. Is there anything > different about the Mac systems? How do you trigger these errors? My understanding is that all sorts of things are different, but I don't know too much about the details. There's of course a chance that there could be some other change in the Asahi Linux kernel that causes this problem to surface — as I said, I reviewed the diff with mainline and didn't see anything that looked relevant, but I could well have missed something. I don't think I can test mainline directly, as it doesn't yet support enough of the hardware — for slightly older Apple Silicon Mac models, I think enough is upstream that this would be possible, but I don't have access to any. I started off encountering these errors every few days. I noticed them because they would sometimes result in my system either starting to freeze for 10 seconds at a time, or until I switched VT. They seem to correlate with the system being under high CPU load. I was also able to substantially increase the frequency with which they occurred by adding logging to the kernel — even just drm.debug=0x10 makes a big difference, and when I also added a few dump_backtrace() calls when I was trying to understand the code and diagnose the problem, I would relatively consistently encounter an Oops within a few minutes of load. BTW: v3 is looking good so far. I've only been testing it since this morning, though, so I'll keep trying it out for a bit longer before I declare the problem to have been solved and send a Tested-by.
Attachment:
signature.asc
Description: PGP signature