On Wed, May 11, 2022 at 06:40:32PM +0300, Dmitry Osipenko wrote: > On 5/11/22 18:29, Daniel Vetter wrote: > > On Wed, May 11, 2022 at 06:14:00PM +0300, Dmitry Osipenko wrote: > >> On 5/11/22 17:24, Christian König wrote: > >>> Am 11.05.22 um 15:00 schrieb Daniel Vetter: > >>>> On Tue, May 10, 2022 at 04:39:53PM +0300, Dmitry Osipenko wrote: > >>>>> [SNIP] > >>>>> Since vmapping implies implicit pinning, we can't use a separate lock in > >>>>> drm_gem_shmem_vmap() because we need to protect the > >>>>> drm_gem_shmem_get_pages(), which is invoked by drm_gem_shmem_vmap() to > >>>>> pin the pages and requires the dma_resv_lock to be locked. > >>>>> > >>>>> Hence the problem is: > >>>>> > >>>>> 1. If dma-buf importer holds the dma_resv_lock and invokes > >>>>> dma_buf_vmap() -> drm_gem_shmem_vmap(), then drm_gem_shmem_vmap() shall > >>>>> not take the dma_resv_lock. > >>>>> > >>>>> 2. Since dma-buf locking convention isn't specified, we can't assume > >>>>> that dma-buf importer holds the dma_resv_lock around dma_buf_vmap(). > >>>>> > >>>>> The possible solutions are: > >>>>> > >>>>> 1. Specify the dma_resv_lock convention for dma-bufs and make all > >>>>> drivers to follow it. > >>>>> > >>>>> 2. Make only DRM drivers to hold dma_resv_lock around dma_buf_vmap(). > >>>>> Other non-DRM drivers will get the lockdep warning. > >>>>> > >>>>> 3. Make drm_gem_shmem_vmap() to take the dma_resv_lock and get deadlock > >>>>> if dma-buf importer holds the lock. > >>>>> > >>>>> ... > >>>> Yeah this is all very annoying. > >>> Ah, yes that topic again :) > >>> > >>> I think we could relatively easily fix that by just defining and > >>> enforcing that the dma_resv_lock must have be taken by the caller when > >>> dma_buf_vmap() is called. > >>> > >>> A two step approach should work: > >>> 1. Move the call to dma_resv_lock() into the dma_buf_vmap() function and > >>> remove all lock taking from the vmap callback implementations. > >>> 2. Move the call to dma_resv_lock() into the callers of dma_buf_vmap() > >>> and enforce that the function is called with the lock held. > >> I've doubts about the need to move out the dma_resv_lock() into the > >> callers of dma_buf_vmap().. > >> > >> I looked through all the dma_buf_vmap() users and neither of them > >> interacts with dma_resv_lock() at all, i.e. nobody takes the lock > >> in/outside of dma_buf_vmap(). Hence it's easy and more practical to make > >> dma_buf_mmap/vmap() to take the dma_resv_lock by themselves. > > i915_gem_dmabuf_vmap -> i915_gem_object_pin_map_unlocked -> > > i915_gem_object_lock -> dma_resv_lock > > > > And all the ttm drivers should work similarly. So there's definitely > > drivers which grab dma_resv_lock from their vmap callback. > > Grr.. I'll take another look. > > >> It's unclear to me which driver may ever want to do the mapping under > >> the dma_resv_lock. But if we will ever have such a driver that will need > >> to map imported buffer under dma_resv_lock, then we could always add the > >> dma_buf_vmap_locked() variant of the function. In this case the locking > >> rule will sound like this: > >> > >> "All dma-buf importers are responsible for holding the dma-reservation > >> lock around the dmabuf->ops->mmap/vmap() calls." > > Are you okay with this rule? Yeah I think long-term it's where we want to be, just trying to find clever ways to get there. And I think Christian agrees with that? > >>> It shouldn't be that hard to clean up. The last time I looked into it my > >>> main problem was that we didn't had any easy unit test for it. > >> Do we have any tests for dma-bufs at all? It's unclear to me what you > >> are going to test in regards to the reservation locks, could you please > >> clarify? > > Unfortunately not really :-/ Only way really is to grab a driver which > > needs vmap (those are mostly display drivers) on an imported buffer, and > > see what happens. > > > > 2nd best is liberally sprinkling lockdep annotations all over the place > > and throwing it at intel ci (not sure amd ci is accessible to the public) > > and then hoping that's good enough. Stuff like might_lock and > > dma_resv_assert_held. > > Alright So throwing it at intel-gfx-ci can't hurt I think, but that only covers i915 so doesn't really help with the bigger issue of catching all the drivers. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch