On Fri, 2024-06-28 at 20:06 +0200, Daniel Vetter wrote: > On Thu, Jun 27, 2024 at 02:18:44PM +0200, Thomas Hellström wrote: > > On Thu, 2024-06-27 at 10:04 +0200, Daniel Vetter wrote: > > > On Wed, Jun 26, 2024 at 05:58:02PM +0200, Thomas Hellström wrote: > > > > Hi! > > > > > > > > I'm seeing the below lockdep splat 1) with the xe driver in an > > > > imported > > > > dma-buf object destruction path. > > > > > > > > It's not because we hold the dma_resv lock at that point, but > > > > rather > > > > because we hold *another* dma_resv lock at that point, and the > > > > dma_resv > > > > detach happens when the object is idle, in this case it was > > > > idle at > > > > the > > > > final put(), and dma_buf_detach() is called in the putting > > > > process. > > > > > > > > Holding another dma-buf lock might happen as part of > > > > drm_exec_unlock_all, or simply if the wider vm dma_resv was > > > > held at > > > > object put time, so it's not an uncommon pattern, even if the > > > > drm_exec > > > > instance can be fixed by putting all bos after unlocking them > > > > all. > > > > > > > > Two solutions coming to mind here: > > > > > > > > 1) Provide a dma_buf_detach_locked() > > > > > > This smells way too much like the endless headaches we had with > > > drm_gem_object_put_locked and friends against > > > drm_device.struct_mutex. Or > > > I'm not understanding what you're doing, because I'm pretty sure > > > you > > > have > > > to take the dma_resv lock on final put() of imported objects. > > > Because > > > that > > > final put() is of the import wrapper, the exporter (and other > > > importers) > > > can still get at that object and so dma_resv_lock is very much > > > needed. > > > > Yeah, the TTM final put looks like > > > > if (!dma_resv_trylock() || !idle) > > queue_work(final_distruction); > > > > dma_resv_unlock(); > > dma_buf_detach(); <--- lockdep splat > > > > Here's where a dma_buf_detach_locked() would've made sense before > > the > > dma_resv_unlock(). > > > > But if you think this will cause grief, I'm completely fine with > > fixing this in TTM by always taking the deferring path. > > Oh I misunderstood what you've meant, I thought you want to do a huge > exercise in passing the "do we know we're locked" flag all the way > through > entire callchains to exporters. > > If it's just so that the fastpath of bypassing the worker can > function for > imported buffers, then I think that's fine. As long as we just punt > to the > worker if we can't get the lock. OK, TBH, the driver would need a drm_prime_gem_destroy_locked() as well since that's the function that calls dma_buf_detach. But TBH I think it's worth it anyway since if we just modify TTM to always take the delayed destruction path, I figure much code will come to depend on it and it will be invasive to update. I'll take a quick stab a that to see how ugly it becomes. /Thomas > -Sima