Re: [PATCH v7 19/21] NFS: Convert readdir page cache to use a cookie based index

"NeilBrown" <neilb@xxxxxxx> · Fri, 25 Feb 2022 14:17:49 +1100

On Fri, 25 Feb 2022, Trond Myklebust wrote:
> On Thu, 2022-02-24 at 12:31 -0500, Benjamin Coddington wrote:
> > 
> > "The XArray implementation is efficient when the indices used are
> > densely
> > clustered; hashing the object and using the hash as the index will
> > not
> > perform well."
> > 
> > However, the "not perform well" may be orders of magnitude smaller
> > than
> > anthing like RPC.  Do you have concerns about this?
> 
> What is the difference between this workload and a random access
> database workload?

Probably the range of expected addresses.
If I understand the proposal correctly, the page addresses in this
workload could be any 64bit number.
For a large database, it would be at most 52 bits (assuming 64bits worth
of bytes), and very likely substantially smaller - maybe 40 bits for a
really really big database.

> 
> If the XArray is incapable of dealing with random access, then we
> should never have chosen it for the page cache. I'm therefore assuming
> that either the above comment is referring to micro-optimisations that
> don't matter much with these workloads, or else that the plan is to
> replace the XArray with something more appropriate for a page cache
> workload.

I haven't looked at the code recently so this might not be 100%
accurate, but XArray generally assumes that pages are often adjacent.
They don't have to be, but there is a cost.
It uses a multi-level array with 9 bits per level.  At each level there
are a whole number of pages for indexes to the next level.

If there are two entries, that are 2^45 separate, that is 5 levels of
indexing that cannot be shared.  So the path to one entry is 5 pages,
each of which contains a single pointer.  The path to the other entry is
a separate set of 5 pages.

So worst case, the index would be about 64/9 or 7 times the size of the
data.  As the number of data pages increases, this would shrink
slightly, but I suspect you wouldn't get below a factor of 3 before you
fill up all of your memory.

NeilBrown