Re: [PATCH v9 14/27] NFS: Improve heuristic for readdirplus

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 10 Mar 2022 20:15:04 +0000

On Wed, 2022-03-09 at 12:39 -0500, Benjamin Coddington wrote:
> On 27 Feb 2022, at 18:12, trondmy@xxxxxxxxxx wrote:
> 
> > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > 
> > The heuristic for readdirplus is designed to try to detect 'ls -l'
> > and
> > similar patterns. It does so by looking for cache hit/miss patterns
> > in
> > both the attribute cache and in the dcache of the files in a given
> > directory, and then sets a flag for the readdirplus code to
> > interpret.
> > 
> > The problem with this approach is that a single attribute or dcache
> > miss
> > can cause the NFS code to force a refresh of the attributes for the
> > entire set of files contained in the directory.
> > 
> > To be able to make a more nuanced decision, let's sample the number
> > of
> > hits and misses in the set of open directory descriptors. That
> > allows us
> > to set thresholds at which we start preferring READDIRPLUS over
> > regular
> > READDIR, or at which we start to force a re-read of the remaining
> > readdir cache using READDIRPLUS.
> 
> I like this patch very much.
> 
> The heuristic doesn't kick-in until "ls -l" makes its second call
> into
> nfs_readdir(), and for my filenames with 8 chars, that means that
> there are
> about 5800 GETATTRs generated before we clean the cache to do more
> READDIRPLUS.  That's a large number to compound on connection
> latency.
> 
> We've already got some complaints that folk's 2nd "ls -l" takes "so
> much
> longer" after 1a34c8c9a49e.
> 
> Can we possibly limit our first pass through nfs_readdir() so that
> the
> heuristic takes effect sooner?
> 

The problem is really that 'ls' (or possibly glibc) is passing in a
pretty huge buffer to the getdents() system call.

On my setup, that buffer appears to be 80K in size. So what happens is
that we get that first getdents() call, and so we fill the 80K buffer
with as many files as will fit. That can quickly run into several
thousand entries, if the filenames are relatively short.

Then 'ls' goes through the contents and does a stat() (or a statx()) on
each entry, and so we record the statistics. However that means those
first several thousand entries are indeed going to use cached data, or
force GETATTR to go on the wire. We only start using forced readdirplus
on the second pass.

Yes, I suppose we could limit getdents() to ignore the buffer size, and
just return fewer entries, however what's the "right" size in that
case?
More to the point, how much pain are we going to accept before we give
up trying these assorted heuristics, and just define a readdirplus()
system call modelled on statx()?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx