Re: [PATCH v6 05/13] NFS: Improve algorithm for falling back to uncached readdir

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 23 Feb 2022 21:31:29 +0000

On Wed, 2022-02-23 at 08:34 -0500, Benjamin Coddington wrote:
> On 23 Feb 2022, at 7:17, Trond Myklebust wrote:
> 
> > On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote:
> > > On 22 Feb 2022, at 15:11, Trond Myklebust wrote:
> > > 
> > > > On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote:
> > > > > On 21 Feb 2022, at 18:20, Trond Myklebust wrote:
> > > > > 
> > > > > > On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington
> > > > > > wrote:
> > > > > > > On 21 Feb 2022, at 15:55, Trond Myklebust wrote:
> > > > > > > > 
> > > > > > > > We will always need the ability to cut over to uncached
> > > > > > > > readdir.
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > > If the cookie is no longer returned by the server
> > > > > > > > because
> > > > > > > > one
> > > > > > > > or more
> > > > > > > > files were deleted then we need to resolve the
> > > > > > > > situation
> > > > > > > > somehow (IOW:
> > > > > > > > the 'rm *' case). The new algorithm _does_ improve
> > > > > > > > performance
> > > > > > > > on those
> > > > > > > > situations, because it no longer requires us to read
> > > > > > > > the
> > > > > > > > entire
> > > > > > > > directory before switching over: we try 5 times, then
> > > > > > > > fail
> > > > > > > > over.
> > > > > > > 
> > > > > > > Yes, using per-page validation doesn't remove the need
> > > > > > > for
> > > > > > > uncached
> > > > > > > readdir.  It does allow a reader to simply resume filling
> > > > > > > the
> > > > > > > cache where
> > > > > > > it left off.  There's no need to try 5 times and fail
> > > > > > > over. 
> > > > > > > And
> > > > > > > there's
> > > > > > > no need to make a trade-off and make the situation worse
> > > > > > > in
> > > > > > > certain
> > > > > > > scenarios.
> > > > > > > 
> > > > > > > I thought I'd point that out and make an offer to re-
> > > > > > > submit
> > > > > > > it. 
> > > > > > > Any
> > > > > > > interest?
> > > > > > > 
> > > > > > 
> > > > > > As I recall, I had concerns about that approach. Can you
> > > > > > explain
> > > > > > again
> > > > > > how it will work?
> > > > > 
> > > > > Every page of readdir results has the directory's change attr
> > > > > stored
> > > > > on the
> > > > > page.  That, along with the page's index and the first cookie
> > > > > is
> > > > > enough
> > > > > information to determine if the page's data can be used by
> > > > > another
> > > > > process.
> > > > > 
> > > > > Which means that when the pagecache is dropped, fillers don't
> > > > > have to
> > > > > restart
> > > > > filling the cache at page index 0, they can continue to fill
> > > > > at
> > > > > whatever
> > > > > index they were at previously.  If another process finds a
> > > > > page
> > > > > that
> > > > > doesn't
> > > > > match its page index, cookie, and the current directory's
> > > > > change
> > > > > attr, the
> > > > > page is dropped and refilled from that process' indexing.
> > > > > 
> > > > > > A few of the concerns I have revolve around
> > > > > > telldir()/seekdir(). If
> > > > > > the
> > > > > > platform is 32-bit, then we cannot use cookies as the
> > > > > > telldir()
> > > > > > output,
> > > > > > and so our only option is to use offsets into the page
> > > > > > cache
> > > > > > (this
> > > > > > is
> > > > > > why this patch carves out an exception if desc->dir_cookie
> > > > > > ==
> > > > > > 0).
> > > > > > How
> > > > > > would that work with you scheme?
> > > > > 
> > > > > For 32-bit seekdir, pages are filled starting at index 0. 
> > > > > This
> > > > > is
> > > > > very
> > > > > unlikely to match other readers (unless they also do the
> > > > > _same_
> > > > > seekdir).
> > > > > 
> > > > > > Even in the 64-bit case where are able to use cookies for
> > > > > > telldir()/seekdir(), how do we determine an appropriate
> > > > > > page
> > > > > > index
> > > > > > after a seekdir()?
> > > > > 
> > > > > We don't.  Instead we start filling at index 0.  Again, the
> > > > > pagecache
> > > > > will
> > > > > only be useful to other processes that have done the same
> > > > > llseek.
> > > > > 
> > > > > This approach optimizes the pagecache for processes that are
> > > > > doing
> > > > > similar
> > > > > work, and has the added benefit of scaling well for large
> > > > > directory
> > > > > listings
> > > > > under memory pressure.  Also a number of classes of directory
> > > > > modifications
> > > > > (such as renames, or insertions/deletions at locations a
> > > > > reader
> > > > > has
> > > > > moved
> > > > > beyond) are no longer a reason to re-fill the pagecache from
> > > > > scratch.
> > > > > 
> > > > 
> > > > OK, you've got me more or less sold on it.
> > > > 
> > > > I'd still like to figure out how to improve the performance for
> > > > seekdir
> > > > (since I do have an interest in re-exporting NFS) but I've been
> > > > playing
> > > > around with a couple of patches that implement your concept and
> > > > they do
> > > > seem to work well for the common case of a linear read through
> > > > the
> > > > directory.
> > > 
> > > Nice.  I have another version from the one I originally posted:
> > > https://lore.kernel.org/linux-nfs/cover.1611160120.git.bcodding@xxxxxxxxxx/
> > > 
> > > .. but I don't remember exactly the changes and it needs
> > > rebasing. 
> > > Should I
> > > rebase it against your testing branch and send the result?
> > 
> > My 2 patches did something slightly different to yours, storing the
> > change attribute in the array header instead of in page_private.
> > That
> > makes for a slightly smaller change.
> 
> I worried that increasing the size of the array header wouldn't allow
> us 
> to
> store as many entries per page.

The struct nfs_cache_array header is 24 bytes long with the change
attribute (as opposed to 16 bytes without it). This size is independent
of the architecture, assuming that 'unsigned int' is 32-bits and
unsigned char is 8-bits (as is always the case on Linux).

On a 64-bit system, A single struct nfs_cache_array_entry ends up being
32 bytes long. So in a standard 4k page with a 24-byte or 16 byte
header you will be able to cache exactly 127 cache array entries.

On a 32-bit system, the cache entry is 28 bytes long (the difference
being the pointer length), and you can pack 145 entries in the 4k page.

IOW: The change in header size makes no difference to the number of
entries you can cache, because in both cases, the header remains
smaller than a single entry.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx