Re: [PATCH 01/10] VFS: support parallel updates in the one directory.

"NeilBrown" <neilb@xxxxxxx> · Thu, 01 Sep 2022 10:31:10 +1000

On Sat, 27 Aug 2022, Al Viro wrote:
> On Fri, Aug 26, 2022 at 12:06:55PM -0700, Linus Torvalds wrote:
> 
> > Because right now I think the main reason we cannot move the lock into
> > the filesystem is literally that we've made the lock cover not just
> > the filesystem part, but the "lookup and create dentry" part too.
> 
> How about rename loop prevention?  mount vs. rmdir?  d_splice_alias()
> fun on tree reconnects?

Thanks for this list.

I think the "mount vs. rmdir" usage of inode_lock() is independent of
the usage for directory operations, so we can change the latter as much
as we like without materially affecting the former.

The lock we take on the directory being removed DOES ensure no new
objects are linked into the directory, so for that reason we still need
at least a shared lock when adding links to a directory.
Moving that lock into the filesystem would invert the locking order in
rmdir between the child being removed and the parent being locked.  That
would require some consideration.

d_splice_alias() happens at ->lookup time so it is already under a
shared lock.  I don't see that it depends on i_rwsem - it uses i_lock
for the important locking.

Rename loop prevention is largely managed by s_vfs_rename_mutex.  Once
that is taken, nothing can be moved to a different directory.  That
means 'trap' will keep any relationship it had to new_path and old_path.
It could be renamed within it's parent, but as long as it isn't removed
the comparisons with old_dentry and new_dentry should still be reliable.
As 'trap' clearly isn't empty, we trust that the filesystem won't allow
an rmdir to succeed.

What have I missed?

Thanks,
NeilBrown

> 
> > But once you have that "DCACHE_PAR_LOOKUP" bit and the
> > d_alloc_parallel() logic to serialize a _particular_ dentry being
> > created (as opposed to serializing all the sleeping ops to that
> > directly), I really think we should strive to move the locking - that
> > no longer helps the VFS dcache layer - closer to the filesystem call
> > and eventually into it.
> 
> See above.
>