Writeback fixes for NFS

NeilBrown <neilb@xxxxxxx> · Thu, 02 Apr 2020 10:52:21 +1100

Please ignore my previous patch (which this is in reply to), it was
flawed in various ways.

I now understand the code a bit better and have a somewhat simpler patch
which appears to address the same problem.

The problem is that writeback to NFS often produces lots of small writes
(10s of K) rather than fewer large writes (1M).  This pattern can often
hurt throughput, but in certain circumstances it can hurt NFS throughput
more than expected.

Each nfs_writepages() call results in an NFS commit being sent to the
server.  If writeback triggers lots of smaller nfs_writepages calls,
this means lots of COMMITs.  If the server is slow to handle the COMMIT
(I've seen the Ganesha NFS server take over 200ms per commit), these
COMMITs can overlap, queue up, and choke the NFS server and cause
order-of-magnitude drop in throughput.

So we really want to only call nfs_writepages when there are a largish
number of pages to be written - i.e. that are 'dirty'.

For historical reasons that I didn't thoroughly research but I'm
confident are no longer relevant, pages that have been written to the
NFS server but have not yet been the subject of a COMMIT - so-called
"unstable" pages - are effectively accounted that same as "dirty" pages
(sometimes called "reclaimable").
This can result in writeback thinking there are lots of "dirty" pages to
reclaim, while nfs_writepages can only find a few that it can write out.

The second patch following changes the accounting for these "unstable"
pages.  They are now always accounted exactly the same was writeback
pages.  Conceptually they can be thought of as still in writeback, but the
writeback is now happening on the server.

A COMMIT will always automatically follow the writes generated by
nfs_writepages, so from the perspective of the VM, there really is no
difference: It has scheduled the write and there is nothing else it can
do except wait.

Testing this patch showed that loop-back NFS is prone to deadlocks
again.  I cannot see exactly how the change to 'unstable' accounting
affected this, but I can see that the old +25% heuristic can no longer
be justified given the complexity of writeback calculations.
So the first patch following changes how writeback is handled for NFS
servers handling loop-back requests (and other similar services) so that
it is more obviously safe against excessive dirty pages scheduled for
other devices.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature