Re: [PATCH v7 19/21] NFS: Convert readdir page cache to use a cookie based index

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 25 Feb 2022 13:11:41 +0000

On Fri, 2022-02-25 at 07:33 -0500, Benjamin Coddington wrote:
> On 24 Feb 2022, at 23:25, Trond Myklebust wrote:
> 
> > On Fri, 2022-02-25 at 14:17 +1100, NeilBrown wrote:
> 
> > > I haven't looked at the code recently so this might not be 100%
> > > accurate,
> > > but XArray generally assumes that pages are often adjacent.  They
> > > don't
> > > have to be, but there is a cost.  It uses a multi-level array
> > > with 9 bits
> > > per level.  At each level there are a whole number of pages for
> > > indexes
> > > to the next level.
> > > 
> > > If there are two entries, that are 2^45 separate, that is 5
> > > levels of
> > > indexing that cannot be shared.  So the path to one entry is 5
> > > pages,
> > > each of which contains a single pointer.  The path to the other
> > > entry is
> > > a separate set of 5 pages.
> > > 
> > > So worst case, the index would be about 64/9 or 7 times the size
> > > of the
> > > data.  As the number of data pages increases, this would shrink
> > > slightly,
> > > but I suspect you wouldn't get below a factor of 3 before you
> > > fill up all
> > > of your memory.
> 
> Yikes!
> 
> > If the problem is just the range, then that is trivial to fix: we
> > can
> > just use xxhash32(), and take the hit of more collisions. However
> > if
> > the problem is the access pattern, then I have serious questions
> > about
> > the choice of implementation for the page cache. If the cache can't
> > support file random access, then we're barking up the wrong tree on
> > the
> > wrong continent.
> 
> I'm guessing the issue might be "get next", which for an "array" is
> probably
> the operation tested for "perform well".  We're not doing any of
> that, we're
> directly addressing pages with our hashed index.
> 
> > Either way, I see avoiding linear searches for cookies as a benefit
> > that is worth pursuing.
> 
> Me too.  What about just kicking the seekdir users up into the second
> half
> of the index, to use xxhash32() up there.  Everyone else can hang out
> in the
> bottom half with dense indexes and help each other out.
> 
> The vast  majority of readdir() use is going to be short listings
> traversed
> in order.  The memory inflation created by a process that needs to
> walk a
> tree and for every two pages of readdir data require 10 pages of
> indexes
> seems pretty extreme.
> 
> Ben
> 

#define NFS_READDIR_COOKIE_MASK (U32_MAX >> 14)
/*
 * Hash algorithm allowing content addressible access to sequences
 * of directory cookies. Content is addressed by the value of the
 * cookie index of the first readdir entry in a page.
 *
 * The xxhash algorithm is chosen because it is fast, and is supposed
 * to result in a decent flat distribution of hashes.
 *
 * We then select only the first 18 bits to avoid issues with excessive
 * memory use for the page cache XArray. 18 bits should allow the caching
 * of 262144 pages of sequences of readdir entries. Since each page holds
 * 127 readdir entries for a typical 64-bit system, that works out to a
 * cache of ~ 33 million entries per directory.
 */
static pgoff_t nfs_readdir_page_cookie_hash(u64 cookie)
{
       if (cookie == 0)
               return 0;
       return xxhash(&cookie, sizeof(cookie), 0) & NFS_READDIR_COOKIE_MASK;
}

So no, this is not a show-stopper.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx