On Tue 23-07-24 09:59:54, David Howells wrote: > When using cachefiles, lockdep may emit something similar to the circular > locking dependency notice below. The problem appears to stem from the > following: > > (1) Cachefiles manipulates xattrs on the files in its cache when called > from ->writepages(). > > (2) The setxattr() and removexattr() system call handlers get the name > (and value) from userspace after taking the sb_writers lock, putting > accesses of the vma->vm_lock and mm->mmap_lock inside of that. > > (3) The afs filesystem uses a per-inode lock to prevent multiple > revalidation RPCs and in writeback vs truncate to prevent parallel > operations from deadlocking against the server on one side and local > page locks on the other. > > Fix this by moving the getting of the name and value in {get,remove}xattr() > outside of the sb_writers lock. This also has the minor benefits that we > don't need to reget these in the event of a retry and we never try to take > the sb_writers lock in the event we can't pull the name and value into the > kernel. Well, it seems like you are trying to get rid of the dependency sb_writers->mmap_sem. But there are other places where this dependency is created, in particular write(2) path is a place where it would be very difficult to get rid of it (you take sb_writers, then do all the work preparing the write and then you copy user data into page cache which may require mmap_sem). But looking at the lockdep splat below: > ====================================================== > WARNING: possible circular locking dependency detected > 6.10.0-build2+ #956 Not tainted > ------------------------------------------------------ > fsstress/6050 is trying to acquire lock: > ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0 > > but task is already holding lock: > ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #4 (&vma->vm_lock->lock){++++}-{3:3}: > __lock_acquire+0xaf0/0xd80 > lock_acquire.part.0+0x103/0x280 > down_write+0x3b/0x50 > vma_start_write+0x6b/0xa0 > vma_link+0xcc/0x140 > insert_vm_struct+0xb7/0xf0 > alloc_bprm+0x2c1/0x390 > kernel_execve+0x65/0x1a0 > call_usermodehelper_exec_async+0x14d/0x190 > ret_from_fork+0x24/0x40 > ret_from_fork_asm+0x1a/0x30 > > -> #3 (&mm->mmap_lock){++++}-{3:3}: > __lock_acquire+0xaf0/0xd80 > lock_acquire.part.0+0x103/0x280 > __might_fault+0x7c/0xb0 > strncpy_from_user+0x25/0x160 > removexattr+0x7f/0x100 > __do_sys_fremovexattr+0x7e/0xb0 > do_syscall_64+0x9f/0x100 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > -> #2 (sb_writers#14){.+.+}-{0:0}: > __lock_acquire+0xaf0/0xd80 > lock_acquire.part.0+0x103/0x280 > percpu_down_read+0x3c/0x90 > vfs_iocb_iter_write+0xe9/0x1d0 > __cachefiles_write+0x367/0x430 > cachefiles_issue_write+0x299/0x2f0 > netfs_advance_write+0x117/0x140 > netfs_write_folio.isra.0+0x5ca/0x6e0 > netfs_writepages+0x230/0x2f0 > afs_writepages+0x4d/0x70 > do_writepages+0x1e8/0x3e0 > filemap_fdatawrite_wbc+0x84/0xa0 > __filemap_fdatawrite_range+0xa8/0xf0 > file_write_and_wait_range+0x59/0x90 > afs_release+0x10f/0x270 > __fput+0x25f/0x3d0 > __do_sys_close+0x43/0x70 > do_syscall_64+0x9f/0x100 > entry_SYSCALL_64_after_hwframe+0x76/0x7e This is the problematic step - from quite deep in the locking chain holding invalidate_lock and having PG_Writeback set you suddently jump to very outer locking context grabbing sb_writers. Now AFAICT this is not a real deadlock problem because the locks are actually on different filesystems, just lockdep isn't able to see this. So I don't think you will get rid of these lockdep splats unless you somehow manage to convey to lockdep that there's the "upper" fs (AFS in this case) and the "lower" fs (the one behind cachefiles) and their locks are different. > -> #1 (&vnode->validate_lock){++++}-{3:3}: > __lock_acquire+0xaf0/0xd80 > lock_acquire.part.0+0x103/0x280 > down_read+0x95/0x200 > afs_writepages+0x37/0x70 > do_writepages+0x1e8/0x3e0 > filemap_fdatawrite_wbc+0x84/0xa0 > filemap_invalidate_inode+0x167/0x1e0 > netfs_unbuffered_write_iter+0x1bd/0x2d0 > vfs_write+0x22e/0x320 > ksys_write+0xbc/0x130 > do_syscall_64+0x9f/0x100 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > -> #0 (mapping.invalidate_lock#3){++++}-{3:3}: > check_noncircular+0x119/0x160 > check_prev_add+0x195/0x430 > __lock_acquire+0xaf0/0xd80 > lock_acquire.part.0+0x103/0x280 > down_read+0x95/0x200 > filemap_fault+0x26e/0x8b0 > __do_fault+0x57/0xd0 > do_pte_missing+0x23b/0x320 > __handle_mm_fault+0x2d4/0x320 > handle_mm_fault+0x14f/0x260 > do_user_addr_fault+0x2a2/0x500 > exc_page_fault+0x71/0x90 > asm_exc_page_fault+0x22/0x30 Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR