Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`

Wedson Almeida Filho <wedsonaf@xxxxxxxxx> · Mon, 23 Oct 2023 09:55:08 -0300

On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote:
> > On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> > > I'm only an expert on the page cache, not the rest of the VFS.  So
> > > what are the rules around modifying i_state for the VFS?
> >
> > Agreed, same question here.
>
> inode->i_state should only be modified under inode->i_lock.
>
> And in most situations, you have to hold the inode->i_lock to read
> state flags as well so that reads are serialised against
> modifications which are typically non-atomic RMW operations.
>
> There is, I think, one main exception to read side locking and this
> is find_inode_rcu() which does an unlocked check for I_WILL_FREE |
> I_FREEING. In this case, the inode->i_state updates in iput_final()
> use WRITE_ONCE under the inode->i_lock to provide the necessary
> semantics for the unlocked READ_ONCE() done under rcu_read_lock().
>
> IOWs, if you follow the general rule that any inode->i_state access
> (read or write) needs to hold inode->i_lock, you probably won't
> screw up.

I don't see filesystems doing this though. In particular, see
iget_locked() -- if a new inode is returned, then it is locked, but if
a cached one is found, it's not locked.

So we're in this situation where a returned inode may or may not be
locked. And the way to determine if it's locked or not is to read
i_state.

Here are examples of kernfs, ext2, ext4 and squashfs doing it:
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/kernfs/inode.c#L252
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext2/inode.c#L1392
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext4/inode.c#L4707
https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/squashfs/inode.c#L82

They all call iget_locked(), and if I_NEW is set, they initialise the
inode and unlock it with unlock_new_inode(); otherwise they just
return the unlocked inode.