Re: [PATCH v7 19/21] NFS: Convert readdir page cache to use a cookie based index

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 25 Feb 2022 04:25:03 +0000

On Fri, 2022-02-25 at 14:17 +1100, NeilBrown wrote:
> On Fri, 25 Feb 2022, Trond Myklebust wrote:
> > On Thu, 2022-02-24 at 12:31 -0500, Benjamin Coddington wrote:
> > > 
> > > "The XArray implementation is efficient when the indices used are
> > > densely
> > > clustered; hashing the object and using the hash as the index
> > > will
> > > not
> > > perform well."
> > > 
> > > However, the "not perform well" may be orders of magnitude
> > > smaller
> > > than
> > > anthing like RPC.  Do you have concerns about this?
> > 
> > What is the difference between this workload and a random access
> > database workload?
> 
> Probably the range of expected addresses.
> If I understand the proposal correctly, the page addresses in this
> workload could be any 64bit number.
> For a large database, it would be at most 52 bits (assuming 64bits
> worth
> of bytes), and very likely substantially smaller - maybe 40 bits for
> a
> really really big database.
> 
> > 
> > If the XArray is incapable of dealing with random access, then we
> > should never have chosen it for the page cache. I'm therefore
> > assuming
> > that either the above comment is referring to micro-optimisations
> > that
> > don't matter much with these workloads, or else that the plan is to
> > replace the XArray with something more appropriate for a page cache
> > workload.
> 
> I haven't looked at the code recently so this might not be 100%
> accurate, but XArray generally assumes that pages are often adjacent.
> They don't have to be, but there is a cost.
> It uses a multi-level array with 9 bits per level.  At each level
> there
> are a whole number of pages for indexes to the next level.
> 
> If there are two entries, that are 2^45 separate, that is 5 levels of
> indexing that cannot be shared.  So the path to one entry is 5 pages,
> each of which contains a single pointer.  The path to the other entry
> is
> a separate set of 5 pages.
> 
> So worst case, the index would be about 64/9 or 7 times the size of
> the
> data.  As the number of data pages increases, this would shrink
> slightly, but I suspect you wouldn't get below a factor of 3 before
> you
> fill up all of your memory.
> 

If the problem is just the range, then that is trivial to fix: we can
just use xxhash32(), and take the hit of more collisions. However if
the problem is the access pattern, then I have serious questions about
the choice of implementation for the page cache. If the cache can't
support file random access, then we're barking up the wrong tree on the
wrong continent.

Either way, I see avoiding linear searches for cookies as a benefit
that is worth pursuing.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx