On Fri, 2022-02-25 at 07:33 -0500, Benjamin Coddington wrote: > On 24 Feb 2022, at 23:25, Trond Myklebust wrote: > > > On Fri, 2022-02-25 at 14:17 +1100, NeilBrown wrote: > > > > I haven't looked at the code recently so this might not be 100% > > > accurate, > > > but XArray generally assumes that pages are often adjacent. They > > > don't > > > have to be, but there is a cost. It uses a multi-level array > > > with 9 bits > > > per level. At each level there are a whole number of pages for > > > indexes > > > to the next level. > > > > > > If there are two entries, that are 2^45 separate, that is 5 > > > levels of > > > indexing that cannot be shared. So the path to one entry is 5 > > > pages, > > > each of which contains a single pointer. The path to the other > > > entry is > > > a separate set of 5 pages. > > > > > > So worst case, the index would be about 64/9 or 7 times the size > > > of the > > > data. As the number of data pages increases, this would shrink > > > slightly, > > > but I suspect you wouldn't get below a factor of 3 before you > > > fill up all > > > of your memory. > > Yikes! > > > If the problem is just the range, then that is trivial to fix: we > > can > > just use xxhash32(), and take the hit of more collisions. However > > if > > the problem is the access pattern, then I have serious questions > > about > > the choice of implementation for the page cache. If the cache can't > > support file random access, then we're barking up the wrong tree on > > the > > wrong continent. > > I'm guessing the issue might be "get next", which for an "array" is > probably > the operation tested for "perform well". We're not doing any of > that, we're > directly addressing pages with our hashed index. > > > Either way, I see avoiding linear searches for cookies as a benefit > > that is worth pursuing. > > Me too. What about just kicking the seekdir users up into the second > half > of the index, to use xxhash32() up there. Everyone else can hang out > in the > bottom half with dense indexes and help each other out. > > The vast majority of readdir() use is going to be short listings > traversed > in order. The memory inflation created by a process that needs to > walk a > tree and for every two pages of readdir data require 10 pages of > indexes > seems pretty extreme. > > Ben > #define NFS_READDIR_COOKIE_MASK (U32_MAX >> 14) /* * Hash algorithm allowing content addressible access to sequences * of directory cookies. Content is addressed by the value of the * cookie index of the first readdir entry in a page. * * The xxhash algorithm is chosen because it is fast, and is supposed * to result in a decent flat distribution of hashes. * * We then select only the first 18 bits to avoid issues with excessive * memory use for the page cache XArray. 18 bits should allow the caching * of 262144 pages of sequences of readdir entries. Since each page holds * 127 readdir entries for a typical 64-bit system, that works out to a * cache of ~ 33 million entries per directory. */ static pgoff_t nfs_readdir_page_cookie_hash(u64 cookie) { if (cookie == 0) return 0; return xxhash(&cookie, sizeof(cookie), 0) & NFS_READDIR_COOKIE_MASK; } So no, this is not a show-stopper. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx