Re: [LSF/MM/BPF TOPIC] allowing parallel directory modifications at the VFS layer

Steve French <smfrench@xxxxxxxxx> · Mon, 3 Feb 2025 11:19:55 -0600



On Tue, Jan 21, 2025 at 7:07 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Tue, 2025-01-21 at 12:20 +1100, Dave Chinner wrote:
> > On Mon, Jan 20, 2025 at 09:25:37AM +1100, NeilBrown wrote:
> > > On Mon, 20 Jan 2025, Dave Chinner wrote:
> > > > On Sat, Jan 18, 2025 at 12:06:30PM +1100, NeilBrown wrote:
> > > > >
> > > > > My question to fs-devel is: is anyone willing to convert their fs (or
> > > > > advice me on converting?) to use the new interface and do some testing
> > > > > and be open to exploring any bugs that appear?
> > > >
> > > > tl;dr: You're asking for people to put in a *lot* of time to convert
> > > > complex filesystems to concurrent directory modifications without
> > > > clear indication that it will improve performance. Hence I wouldn't
> > > > expect widespread enthusiasm to suddenly implement it...
> > >
> > > Thanks Dave!
> > > Your point as detailed below seems to be that, for xfs at least, it may
> > > be better to reduce hold times for exclusive locks rather than allow
> > > concurrent locks.  That seems entirely credible for a local fs but
> > > doesn't apply for NFS as we cannot get a success status before the
> > > operation is complete.
> >
> > How is that different from a local filesystem? A local filesystem
> > can't return from open(O_CREAT) with a struct file referencing a
> > newly allocated inode until the VFS inode is fully instantiated (or
> > failed), either...
> >
> > i.e. this sounds like you want concurrent share-locked dirent ops so
> > that synchronously processed operations can be issued concurrently.
> >
> > Could the NFS client implement asynchronous directory ops, keeping
> > track of the operations in flight without needing to hold the parent
> > i_rwsem across each individual operation? This basically what I've
> > been describing for XFS to minimise parent dir lock hold times.
> >
>
> Yes, basically. The protocol and NFS client have no requirement to
> serialize directory operations. We'd be happy to spray as many at the
> server in parallel as we can get away with. We currently don't do that
> today, largely because the VFS prohibits it.
>
> The NFS server, or exported filesystem may have requirements that
> serialize these operations though.
>
> > What would VFS support for that look like? Is that of similar
> > complexity to implementing shared locking support so that concurrent
> > blocking directory operations can be issued? Is async processing a
> > better model to move the directory ops towards so we can tie
> > userspace directly into it via io_uring?
> >
>
> Given that the VFS requires an exclusive lock today for directory
> morphing ops, moving to a model where we can take a shared lock on the
> directory instead seems like a nice, incremental approach to dealing
> with this problem.
>
> That said, I get your objection. Not being able to upgrade a rwsem
> makes that shared lock kind of nasty for filesystems that actually do
> rely on it for some parts of their work today.
>
> The usual method of dealing with that would be to create a new XFS-only
> per-inode lock that would take over that serialization. The nice thing
> there is that you could (over time) reduce its scope.
>
> > > So it seems likely that different filesystems
> > > will want different approaches.  No surprise.
> > >
> > > There is some evidence that ext4 can be converted to concurrent
> > > modification without a lot of work, and with measurable benefits.  I
> > > guess I should focus there for local filesystems.
> > >
> > > But I don't want to assume what is best for each file system which is
> > > why I asked for input from developers of the various filesystems.

This is an interesting question for SMB3.1.1 as well (cifs.ko etc.), especially
in workloads where the client already has a directory lease on the directory
being updated (and with multichannel there could be quite a bit of i/o in
flight), but even without a directory lease there is no restriction on sending
multiple simultaneous directory related requests to the server


-- 
Thanks,

Steve