On Wed, 2022-03-09 at 12:39 -0500, Benjamin Coddington wrote: > On 27 Feb 2022, at 18:12, trondmy@xxxxxxxxxx wrote: > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > The heuristic for readdirplus is designed to try to detect 'ls -l' > > and > > similar patterns. It does so by looking for cache hit/miss patterns > > in > > both the attribute cache and in the dcache of the files in a given > > directory, and then sets a flag for the readdirplus code to > > interpret. > > > > The problem with this approach is that a single attribute or dcache > > miss > > can cause the NFS code to force a refresh of the attributes for the > > entire set of files contained in the directory. > > > > To be able to make a more nuanced decision, let's sample the number > > of > > hits and misses in the set of open directory descriptors. That > > allows us > > to set thresholds at which we start preferring READDIRPLUS over > > regular > > READDIR, or at which we start to force a re-read of the remaining > > readdir cache using READDIRPLUS. > > I like this patch very much. > > The heuristic doesn't kick-in until "ls -l" makes its second call > into > nfs_readdir(), and for my filenames with 8 chars, that means that > there are > about 5800 GETATTRs generated before we clean the cache to do more > READDIRPLUS. That's a large number to compound on connection > latency. > > We've already got some complaints that folk's 2nd "ls -l" takes "so > much > longer" after 1a34c8c9a49e. > > Can we possibly limit our first pass through nfs_readdir() so that > the > heuristic takes effect sooner? > The problem is really that 'ls' (or possibly glibc) is passing in a pretty huge buffer to the getdents() system call. On my setup, that buffer appears to be 80K in size. So what happens is that we get that first getdents() call, and so we fill the 80K buffer with as many files as will fit. That can quickly run into several thousand entries, if the filenames are relatively short. Then 'ls' goes through the contents and does a stat() (or a statx()) on each entry, and so we record the statistics. However that means those first several thousand entries are indeed going to use cached data, or force GETATTR to go on the wire. We only start using forced readdirplus on the second pass. Yes, I suppose we could limit getdents() to ignore the buffer size, and just return fewer entries, however what's the "right" size in that case? More to the point, how much pain are we going to accept before we give up trying these assorted heuristics, and just define a readdirplus() system call modelled on statx()? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx