On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote: > > On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote: > > > I'm only an expert on the page cache, not the rest of the VFS. So > > > what are the rules around modifying i_state for the VFS? > > > > Agreed, same question here. > > inode->i_state should only be modified under inode->i_lock. > > And in most situations, you have to hold the inode->i_lock to read > state flags as well so that reads are serialised against > modifications which are typically non-atomic RMW operations. > > There is, I think, one main exception to read side locking and this > is find_inode_rcu() which does an unlocked check for I_WILL_FREE | > I_FREEING. In this case, the inode->i_state updates in iput_final() > use WRITE_ONCE under the inode->i_lock to provide the necessary > semantics for the unlocked READ_ONCE() done under rcu_read_lock(). > > IOWs, if you follow the general rule that any inode->i_state access > (read or write) needs to hold inode->i_lock, you probably won't > screw up. I don't see filesystems doing this though. In particular, see iget_locked() -- if a new inode is returned, then it is locked, but if a cached one is found, it's not locked. So we're in this situation where a returned inode may or may not be locked. And the way to determine if it's locked or not is to read i_state. Here are examples of kernfs, ext2, ext4 and squashfs doing it: https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/kernfs/inode.c#L252 https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext2/inode.c#L1392 https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext4/inode.c#L4707 https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/squashfs/inode.c#L82 They all call iget_locked(), and if I_NEW is set, they initialise the inode and unlock it with unlock_new_inode(); otherwise they just return the unlocked inode.