Am 04.07.2018 um 11:09 schrieb Michel Dänzer: > On 2018-07-04 10:31 AM, Christian König wrote: >> Am 26.06.2018 um 16:31 schrieb Michel Dänzer: >>> From: Michel Dänzer <michel.daenzer at amd.com> >>> >>> Fixes the BUG_ON spuriously triggering under the following >>> circumstances: >>> >>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using >>>   the same reservation object, so it calls >>>   reservation_object_reserve_shared with that reservation object once >>>   for each such BO. >>> * In reservation_object_reserve_shared, old->shared_count == >>>   old->shared_max - 1, so obj->staged is freed in preparation of an >>>   in-place update. >>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence >>>   once for each of the BOs above, always with the same fence. >>> * The first call adds the fence in the remaining free slot, after which >>>   old->shared_count == old->shared_max. >> Well, the explanation here is not correct. For multiple BOs using the >> same reservation object we won't call >> reservation_object_add_shared_fence() multiple times because we move >> those to the duplicates list in ttm_eu_reserve_buffers(). >> >> But this bug can still happen because we call >> reservation_object_add_shared_fence() manually with fences for the same >> context in a couple of places. >> >> One prominent case which comes to my mind are for the VM BOs during >> updates. Another possibility are VRAM BOs which need to be cleared. > Thanks. How about the following: > > * ttm_eu_reserve_buffers calls reservation_object_reserve_shared. > * In reservation_object_reserve_shared, shared_count == shared_max - 1, > so obj->staged is freed in preparation of an in-place update. > * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence, > after which shared_count == shared_max. > * The amdgpu driver also calls reservation_object_add_shared_fence for > the same reservation object, and the BUG_ON triggers. I would rather completely drop the reference to the ttm_eu_* functions, cause those wrappers are completely unrelated to the problem. Instead let's just note something like the following: * When reservation_object_reserve_shared is called with shared_count == shared_max - 1,  so obj->staged is freed in preparation of an in-place update. * Now reservation_object_add_shared_fence is called with the first fence and after that shared_count == shared_max. * After that reservation_object_add_shared_fence can be called with follow up fences from the same context, but since shared_count == shared_max we would run into this BUG_ON. > However, nothing bad would happen in > reservation_object_add_shared_inplace, since all fences use the same > context, so they can only occupy a single slot. > > Prevent this by moving the BUG_ON to where an overflow would actually > happen (e.g. if a buggy caller didn't call > reservation_object_reserve_shared before). > > > Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2, > as I suspect this fix is necessary under the circumstances described > there as well. The rest sounds good to me. Regards, Christian.