Op 18-01-2021 om 16:05 schreef Thomas Hellström: > > On 1/18/21 3:46 PM, Maarten Lankhorst wrote: >> Op 18-01-2021 om 14:28 schreef Thomas Hellström: >>> On 1/18/21 2:22 PM, Thomas Hellström wrote: >>>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote: >>>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström: >>>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote: >>>>>>> We get a lockdep splat when the reset mutex is held, because it can be >>>>>>> taken from fence_wait. This conflicts with the mmu notifier we have, >>>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier. >>>>>>> >>>>>>> Remove this recursion by calling revoke_mmaps before taking the lock. >>>>>> Hmm. Is the mmap se taken from gt_revoke()? >>>>>> >>>>>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)? >>>>> Hey, >>>>> >>>>> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(), >>>>> >>>>> so this change should be ok since all those mappings are invalidated correctly and completed before this point. >>>>> >>>>> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok. >>>>> >>>>> ~Maarten >>>> Hmm, OK but then we still have the following established locking order. >>>> >>>> lock(fence_signaling) >>>> lock(i_mmap_lock) >>>> >>>> But in the notifier >>>> >>>> lock(i_mmap_lock) >>>> fence_signaling(within notifier) >>>> >>>> So gt_revoke() is violating dma-fence rules. >>>> >>>> BTW it looks to me like the reset mutex notation is actually doing much the same as the dma-fence annotations; While we can move gt_revoke() out of the reset mutex, that only gives us false hopes since it moves it out of the equivalent dma-fence annotation. I figure the reason this was not seen before the new code is that the reset mutex lockdep isn't taken when waiting for active. Only when waiting for dma-fence, but IMO the root problem is pre-existing. >>>> >>>> /Thomas >>>> >>>> >>> The interesting scenario is >>> >>> thread 1: >>> take i_mmap_lock() >>> enter_mmu_notifier() >>> wait_fence() >>> >>> thread 2: >>> need_to_reset_gpu_for_the_above_fence(); >>> take i_mmap_lock() >>> >>> Deadlock. >>> >>> /Thomas >>> >>> >> Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1 isn't doing anything wrong, gpu reset probably should stop revoking gt bindings, and allow some garbage during reset. I don't see another way out. :-/ > > Me neither. > > But to silence lockdep until dma_fence annotation is widely added: > > Reviewed-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > Ideally we'll add fence signaling annotations to gpu reset, to exactly detect these kind of things. Hopefully in the future. :) _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx