On Tue 28-02-23 12:58:07, Dave Chinner wrote: > On Fri, Feb 24, 2023 at 07:46:57PM -0800, Darrick J. Wong wrote: > > So xfs_dir2_sf_replace can rewrite the shortform structure (or even > > convert it to block format!) while readdir is accessing it. Or am I > > mising something? > > True, I missed that. > > Hmmmm. ISTR that holding ILOCK over filldir callbacks causes > problems with lock ordering{1], and that's why we removed the ILOCK > from the getdents path in the first place and instead relied on the > IOLOCK being held by the VFS across readdir for exclusion against > concurrent modification from the VFS. > > Yup, the current code only holds the ILOCK for the extent lookup and > buffer read process, it drops it while it is walking the locked > buffer and calling the filldir callback. Which is why we don't hold > it for xfs_dir2_sf_getdents() - the VFS is supposed to be holding > i_rwsem in exclusive mode for any operation that modifies a > directory entry. We should only need the ILOCK for serialising the > extent tree loading, not for serialising access vs modification to > the directory. > > So, yeah, I think you're right, Darrick. And the fix is that the VFS > needs to hold the i_rwsem correctly for allo inodes that may be > modified during rename... But Al Viro didn't want to lock the inode in the VFS (as some filesystems don't need the lock) so in ext4 we ended up grabbing the lock in ext4_rename() like: + /* + * We need to protect against old.inode directory getting + * converted from inline directory format into a normal one. + */ + inode_lock_nested(old.inode, I_MUTEX_NONDIR2); (Linus didn't merge the ext4 pull request so the change isn't upstream yet). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR