On Mon, Dec 30, 2024 at 10:22:27AM -0800, Suren Baghdasaryan wrote: [...] > > > > > > > > Also a quick look seems to suggest that the lock dependency on CPU 1: > > > > > > > > lock(&vma->vm_lock->lock); > > > > lock(sb_pagefaults#4); > > > > > > > > can happen in a page fault with a reader of &vma->vm_lock->lock. > > > > > > The report clearly indicates a call to vma_start_write(), which means > > > vm_lock is being write-locked, not read-locked. That's why I commented > > > that the report does not consider that mmap_write_lock is already > > > taken when vma_start_write() is called. > > > > > > > > > > > do_page_fault(): > > > > lock_vma_under_rcu(): > > > > vma_start_read(): > > > > down_read_trylock(); // read lock &vma->vm_lock_lock here. > > > > ... > > > > handle_mm_fault(): > > > > sb_start_pagefault(); // lock(sb_pagefaults#4); > > > > > > > > if so, an existing reader can block the other writer, so I don't think > > > > the mmap_lock write protection can help here. > > > > > > In your example vma->vm_lock would be read-locked before > > > po->pg_vec_lock but in the report po->pg_vec_lock is locked before > > > vma->vm_lock->lock. I don't think what is reported here is the > > > do_page_fault() path. > > > > > > > You're missing the point, in the report, the current stack is indeed in > > a write path (i.e. &mm->mmap_lock first and then &vma->vm_lock->lock), > > however that's only part of the picture. The deadlock > > possibility is due to that there could be a concurrent do_page_fault() > > which will hold &vma->vm_lock->lock first and wait for another lock that > > eventually has a dependency on a &mm->mmap_lock. > > I need to see a more concrete example. > Note that do_page_fault() does not even read-lock the mmap_lock when > it uses vma->vm_lock, that's the whole point of per-vma locks that we > avoid using mmap_lock. So, even if it later waits on some other lock > that has mm->mmap_lock dependency, that should not block it. > Again, you might be right and there might be a lockdep issue but I > need a more specific example to see if it's real. > Understood. I clearly don't have the whole set of knowledge/skills to make the call ;-) I just tried my best to figure out what lockdep thought in this case (see the other email), it's quite fun to hunt down a "deadlock" possiblity involing 11 locks. Right now, I'm leaning torwards that this is 80% a false positive because one of the dependency was built during initcall, so it may not happen in real code, but I need to defer that to drm folks. Regards, Boqun > > > > Regards, > > Boqun > > [...]