Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

"J. Bruce Fields" <bfields@xxxxxxxxxx> · Tue, 30 Jan 2018 10:09:38 -0500

On Tue, Jan 30, 2018 at 08:49:27AM +0000, Terry Barnaby wrote:
> On 29/01/18 22:28, J. Bruce Fields wrote:
> > On Mon, Jan 29, 2018 at 08:37:50PM +0000, Terry Barnaby wrote:
> > > Ok, that's a shame unless NFSv4's write performance with small files/dirs
> > > is relatively ok which it isn't on my systems.
> > > Although async was "unsafe" this was not an issue in main standard
> > > scenarios such as an NFS mounted home directory only being used by one
> > > client.
> > > The async option also does not appear to work when using NFSv3. I guess it
> > > was removed from that protocol at some point as well ?
> > This isn't related to the NFS protocol version.
> > 
> > I think everybody's confusing the server-side "async" export option with
> > the client-side mount "async" option.  They're not really related.
> > 
> > The unsafe thing that speeds up file creates is the server-side "async"
> > option.  Sounds like you tried to use the client-side mount option
> > instead, which wouldn't do anything.
> > 
> > > What is the expected sort of write performance when un-taring, for example,
> > > the linux kernel sources ? Is 2 MBytes/sec on average on a Gigabit link
> > > typical (3 mins to untar 4.14.15) or should it be better ?
> > It's not bandwidth that matters, it's latency.
> > 
> > The file create isn't allowed to return until the server has created the
> > file and the change has actually reached disk.
> > 
> > So an RPC has to reach the server, which has to wait for disk, and then
> > the client has to get the RPC reply.  Usually it's the disk latency that
> > dominates.
> > 
> > And also the final close after the new file is written can't return
> > until all the new file data has reached disk.
> > 
> > v4.14.15 has 61305 files:
> > 
> > 	$ git ls-tree -r  v4.14.15|wc -l
> > 	61305
> > 
> > So time to create each file was about 3 minutes/61305 =~ 3ms.
> > 
> > So assuming two roundtrips per file, your disk latency is probably about
> > 1.5ms?
> > 
> > You can improve the storage latency somehow (e.g. with a battery-backed
> > write cache) or use more parallelism (has anyone ever tried to write a
> > parallel untar?).  Or you can cheat and set the async export option, and
> > then the server will no longer wait for disk before replying.  The
> > problem is that on server reboot/crash, the client's assumptions about
> > which operations succeeded may turn out to be wrong.
> > 
> > --b.
> 
> Many thanks for your reply.
> 
> Yes, I understand the above (latency and normally synchronous nature of
> NFS). I have async defined in the servers /etc/exports options. I have,
> later, also defined it on the client side as the async option on the server
> did not appear to be working and I wondered if with ongoing changes it had
> been moved there (would make some sense for the client to define it and pass
> this option over to the server as it knows, in most cases, if the bad
> aspects of async would be an issue to its usage in the situation in
> question).
> 
> It's a server with large disks, so SSD is not really an option. The use of
> async is ok for my usage (mainly /home mounted and users home files only in
> use by one client at a time etc etc.).

Note it's not concurrent access that will cause problems, it's server
crashes.  A UPS may reduce the risk a little.

> However I have just found that async is actually working! I just did not
> believe it was, due to the poor write performance. Without async on the
> server the performance is truly abysmal. The figures I get for untaring the
> kernel sources (4.14.15 895MBytes untared) using "rm -fr linux-4.14.15;
> sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)" are:
> 
> Untar on server to its local disk:  13 seconds, effective data rate: 68
> MBytes/s
> 
> Untar on server over NFSv4.2 with async on server:  3 minutes, effective
> data rate: 4.9 MBytes/sec
> 
> Untar on server over NFSv4.2 without async on server:  2 hours 12 minutes,
> effective data rate: 115 kBytes/s !!

2:12 is 7920 seconds, and you've got 61305 files to write, so that's
about 130ms/file.  That's more than I'd expect even if you're waiting
for a few seeks on each file create, so there may indeed be something
wrong.

By comparison on my little home server (Fedora, ext4, a couple WD Black
1TB drives), with sync, that untar takes is 7:44, about 8ms/file.

What's the disk configuration and what filesystem is this?

> Is it really expected for NFS to be this bad these days with a reasonably
> typical operation and are there no other tuning parameters that can help  ?

It's expected that the performance of single-threaded file creates will
depend on latency, not bandwidth.

I believe high-performance servers use battery backed write caches with
storage behind them that can do lots of IOPS.

(One thing I've been curious about is whether you could get better
performance cheap on this kind of workload ext3/4 striped across a few
drives and an external journal on SSD.  But when I experimented with
that a few years ago I found synchronous write latency wasn't much
better.  I didn't investigate why not, maybe that's just the way SSDs
are.)

--b.
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx