https://bugzilla.kernel.org/show_bug.cgi?id=207383 --- Comment #75 from Kees Cook (kees@xxxxxxxxxxx) --- Hi! First, let me say sorry for all the work my patch has caused! It seems like it might be tickling another (previously dormant) bug in the gpu driver. (In reply to mnrzk from comment #30) > I've been looking at this bug for a while now and I'll try to share what > I've found about it. > > In some conditions, when amdgpu_dm_atomic_commit_tail calls > dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct > dm_atomic_state* with an garbage context pointer. > > I've also found that this bug exclusively occurs when commit_work is on the > workqueue. After forcing drm_atomic_helper_commit to run all of the commits > without adding to the workqueue and running the OS, the issue seems to have > disappeared. The system was stable for at least 1.5 hours before I manually > shut it down (meanwhile it has usually crashed within 30-45 minutes). > > Perhaps there's some sort of race condition occurring after commit_work is > queued? If it helps to explain what's happening in 3202fa62f, the kernel memory allocator is moving it's free pointer from offset 0 to the middle of the object. That means that when the memory is freed, it writes 8 bytes to join the newly freed memory into the allocator's freelist. That always happened, but after 3202fa62f it began writing it in the middle, not offset 0. If the work queue is trying to use freed memory, and before it didn't notice the first 8 bytes getting written, now it appears to notice the overwrite... but that still means something is freeing memory before it should. Finding that might be a real trick. :( However, if you've suffered through all those bisections, I wonder if you can try one other thing, which is to compile the kernel with KASAN: CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y CONFIG_KASAN_OUTLINE=y CONFIG_KASAN_STACK=y CONFIG_KASAN_VMALLOC=y This will make things _slow_, which might mean the use-after-free race may never trigger. *However* it's possible that it'll catch a bad behavior before it even needs to get hit in a race that triggers the behavior you're seeing. (And note that swapping CONFIG_KASAN_OUTLINE=y for CONFIG_KASAN_INLINE=y might speed things up, but the kernel image gets bigger). I'm going to try to read the work queue code for the driver and see if anything obvious stands out... -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel