Re: parallel file create rates (+high latency)

Daire Byrne <daire@xxxxxxxx> · Mon, 25 Apr 2022 17:47:16 +0100

On Mon, 25 Apr 2022 at 17:02, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>
> On Mon, Apr 25, 2022 at 04:24:50PM +0100, Daire Byrne wrote:
> > On Mon, 25 Apr 2022 at 14:22, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Apr 25, 2022 at 02:00:32PM +0100, Daire Byrne wrote:
> > > > On Mon, 21 Feb 2022 at 13:59, Daire Byrne <daire@xxxxxxxx> wrote:
> > > > >
> > > > > On Fri, 18 Feb 2022 at 07:46, NeilBrown <neilb@xxxxxxx> wrote:
> > > > > > I've ported it to mainline without much trouble.  I started some simple
> > > > > > testing (parallel create/delete of the same file) and hit a bug quite
> > > > > > easily.  I fixed that (eventually) and then tried with more than 1 CPU,
> > > > > > and hit another bug.  But then it was quitting time.  If I can get rid
> > > > > > of all the easy to find bugs, I'll post it with a CC to you, and you can
> > > > > > find some more for me!
> > > > >
> > > > > That would be awesome! I have a real world production case for this
> > > > > and it's a pretty heavy workload. If that doesn't shake out any bugs,
> > > > > nothing will.
> > > > >
> > > > > The only caveat being that it will likely be restricted to NFSv3
> > > > > testing due to the concurrency limitations with NFSv4.1+ (from the
> > > > > other thread).
> > > > >
> > > > > Daire
> > > >
> > > > Just to follow up on this again - I have been using Neil's patch for
> > > > parallel file creates (thanks!) but I'm a bit confused as to why it
> > > > doesn't seem to help in my NFS re-export case.
> > > >
> > > > With the patch, I can achieve much higher parallel (multi process)
> > > > creates directly on my re-export server to a high latency remote
> > > > server mount, but when I re-export that to multiple clients, the
> > > > aggregate create rate again degrades to that which we might expect
> > > > either without the patch or if there was only one process creating the
> > > > files in sequence.
> > > >
> > > > My assumption was that the nfsd threads of the re-export server would
> > > > act as multiple independent processes and it's clients would be spread
> > > > across them such that they would also benefit from the parallel
> > > > creates patch on the re-export server. So I expected many clients
> > > > creating files in the same directory would achieve much higher
> > > > aggregate performance.
> > >
> > > That's the idea.
> > >
> > > I've lost track, where's the latest version of Neil's patch?
> > >
> > > --b.
> >
> > The latest is still the one from this thread (with a minor update to
> > apply it to v5.18-rc):
> >
> > https://lore.kernel.org/lkml/893053D7-E5DD-43DB-941A-05C10FF5F396@xxxxxxxxx/T/#m922999bf830cacb745f32cc464caf72d5ffa7c2c
>
> Thanks!
>
> I haven't really tried to understand that patch--but just looking at the
> diffstat, it doesn't touch fs/nfsd/.  And nfsd calls into the vfs only
> after it locks the parent.  So nfsd is probably still using
> the old behavior, where local callers are using the new (parallel)
> behavior.
>
> So I bet what you're seeing is expected, and all that's needed is some
> updates to fs/nfsd/vfs.c to reflect whatever Neil did in fs/namei.c.
>
> --b.

Ah right, that would explain it then - thanks. I just naively assumed
that nfsd would pass straight into the VFS and rely on those locks.

I'll stare at fs/nfsd/vfs.c for a bit but I probably lack the
expertise to make it work.

It's also not entirely clear that this parallel creates RFC patch will
ever make it into mainline?

Daire

> > My test is something like this:
> >
> > reexport1 # for x in {1..5000}; do
> >     echo /srv/server1/touch.$HOSTNAME.$x
> > done | xargs -n1 -P 200 -iX -t touch X 2>&1 | pv -l -a >|/dev/null
> >
> > Without the patch this results in 3 creates/s and with the patch it's
> > ~250 creates/s with 200 threads/processes (200ms latency) when run
> > directly against a remote RHEL8 server (server1).
> >
> > Then I run something similar to this but simultaneously across 200
> > clients of the "reexport1" server's re-export of the originating
> > "server1". I get an aggregate of around 3 creates/s even with the
> > patch applied to reexport1 (v5.18-rc2) which is suspiciously similar
> > to the performance without the parallel vfs create patch.
> >
> > The clients don't run any special kernels or configurations. I have
> > only tested NFSv3 so far.