On Fri 12-07-24 10:37:08, Theodore Ts'o wrote: > On Fri, Jul 12, 2024 at 02:27:20PM +0800, Zhihao Cheng wrote: > > Problem description > > =================== > > > > The inode reclaiming process(See function prune_icache_sb) collects all > > reclaimable inodes and mark them with I_FREEING flag at first, at that > > time, other processes will be stuck if they try getting these inodes(See > > function find_inode_fast), then the reclaiming process destroy the > > inodes by function dispose_list(). > > Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may > > do inode lookup in the inode evicting callback function, if the inode > > lookup is operated under the inode lru traversing context, deadlock > > problems may happen. > > > > Case 1: In function ext4_evict_inode(), the ea inode lookup could happen > > if ea_inode feature is enabled, the lookup process will be stuck under > > the evicting context like this: > > > > 1. File A has inode i_reg and an ea inode i_ea > > 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea > > 3. Then, following three processes running like this: > > > > PA PB > > echo 2 > /proc/sys/vm/drop_caches > > shrink_slab > > prune_dcache_sb > > // i_reg is added into lru, lru->i_ea->i_reg > > prune_icache_sb > > list_lru_walk_one > > inode_lru_isolate > > i_ea->i_state |= I_FREEING // set inode state > > i_ea->i_state |= I_FREEING // set inode state > > Um, I don't see how this can happen. If the ea_inode is in use, > i_count will be greater than zero, and hence the inode will never be > go down the rest of the path in inode_lru_inode(): > > if (atomic_read(&inode->i_count) || > ...) { > list_lru_isolate(lru, &inode->i_lru); > spin_unlock(&inode->i_lock); > this_cpu_dec(nr_unused); > return LRU_REMOVED; > } > > Do you have an actual reproduer which triggers this? Or would this > happen be any chance something that was dreamed up with DEPT? No, it looks like a real problem and I agree with the analysis. We don't hold ea_inode reference (i.e., ea_inode->i_count) from a normal inode. The normal inode just owns that that special on-disk xattr reference. Standard inode references are acquired and dropped as needed. And this is exactly the problem: ext4_xattr_inode_dec_ref_all() called from evict() needs to lookup the ea_inode and iget() it. So if we are processing a list of inodes to dispose, all inodes have I_FREEING bit already set and if ea_inode and its parent normal inode are both in the list, then the evict()->ext4_xattr_inode_dec_ref_all()->iget() will deadlock. Normally we don't hit this path because LRU list walk is not handling inodes with 0 link count. But a race with unlink can make that happen with iput() from inode_lru_isolate(). I'm pondering about the best way to fix this. Maybe we could handle the need for inode pinning in inode_lru_isolate() in a similar way as in writeback code so that last iput() cannot happen from inode_lru_isolate(). In writeback we use I_SYNC flag to pin the inode and evict() waits for this flag to clear. I'll probably sleep to it and if I won't find it too disgusting to live tomorrow, I can code it. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR