https://bugzilla.kernel.org/show_bug.cgi?id=207383 --- Comment #76 from mnrzk@xxxxxxxxxxxxxx --- (In reply to Kees Cook from comment #75) > Hi! > > First, let me say sorry for all the work my patch has caused! It seems like > it might be tickling another (previously dormant) bug in the gpu driver. > > > (In reply to mnrzk from comment #30) > > I've been looking at this bug for a while now and I'll try to share what > > I've found about it. > > > > In some conditions, when amdgpu_dm_atomic_commit_tail calls > > dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct > > dm_atomic_state* with an garbage context pointer. > > > > I've also found that this bug exclusively occurs when commit_work is on the > > workqueue. After forcing drm_atomic_helper_commit to run all of the commits > > without adding to the workqueue and running the OS, the issue seems to have > > disappeared. The system was stable for at least 1.5 hours before I manually > > shut it down (meanwhile it has usually crashed within 30-45 minutes). > > > > Perhaps there's some sort of race condition occurring after commit_work is > > queued? > > If it helps to explain what's happening in 3202fa62f, the kernel memory > allocator is moving it's free pointer from offset 0 to the middle of the > object. That means that when the memory is freed, it writes 8 bytes to join > the newly freed memory into the allocator's freelist. That always happened, > but after 3202fa62f it began writing it in the middle, not offset 0. If the > work queue is trying to use freed memory, and before it didn't notice the > first 8 bytes getting written, now it appears to notice the overwrite... but > that still means something is freeing memory before it should. > > Finding that might be a real trick. :( However, if you've suffered through > all those bisections, I wonder if you can try one other thing, which is to > compile the kernel with KASAN: > > CONFIG_KASAN=y > CONFIG_KASAN_GENERIC=y > CONFIG_KASAN_OUTLINE=y > CONFIG_KASAN_STACK=y > CONFIG_KASAN_VMALLOC=y > > This will make things _slow_, which might mean the use-after-free race may > never trigger. *However* it's possible that it'll catch a bad behavior > before it even needs to get hit in a race that triggers the behavior you're > seeing. (And note that swapping CONFIG_KASAN_OUTLINE=y for > CONFIG_KASAN_INLINE=y might speed things up, but the kernel image gets > bigger). > > I'm going to try to read the work queue code for the driver and see if > anything obvious stands out... Actually this makes perfect sense, struct dm_atomic_state* dm_state has two components, base (a struct containing a struct drm_atomic_state*) and context (a struct dc_state*). Reading through the code of amdgpu_dm_atomic_commit_tail, I see that dm_state->base is never used. If my understanding is correct, base would have previously been filled with the freelist pointer (since it's the first 8 bytes). Now since the freelist pointer is being put in the middle (rounded to the nearest sizeof(void*), or 8 bytes), it's being put in the last 8 bytes of *dm_state (or dm_state->context). I'll place a void* for padding in the middle of struct dm_atomic_state* and if my hypothesis is correct, the padding will be filled with garbage data instead of context and the bug should be fixed. Of course, there would still be a use-after-free bug in the code which may cause other issues in the future so I wouldn't really consider it a solution. Regarding KASAN, I've tried compiling the kernel with KASAN enabled and from my experience, the bug did not trigger after actively using the system for 3 hours and leaving it on for 12 hours. This was almost a month ago though so maybe I'll try again with different KASAN options (i.e. CONFIG_KASAN_INLINE=y). If anyone has any more tips on getting KASAN to run faster, I'll be glad to hear them. -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel