On Wed, Feb 07, 2024 at 08:35:59AM +1100, Dave Chinner wrote: > > The solution to this problem is to change the interval tree data structure > > from an Red-Black tree to a B-tree, or something similar where we use > > an array of pointers instead of a single pointer. > > .... B-trees are not immune to pointer chasing problems, either. > Most use binary searches within the nodes, and so we have the > problem of unpredictable cacheline misses within the nodes as well > as being unable to do depth based prefetch similar to rbtrees. > > Perhaps we should be looking at something like this: > > https://lore.kernel.org/linux-fsdevel/20240126220655.395093-2-kent.overstreet@xxxxxxxxx/ I need more data (and maybe Kent has it!) Without any data, I believe that Eytzinger layout is an idea that was a really good one in the 1990s/2000s and has now passed its expiration date because of the changed nature of hardware. In the mid-90s, we could do O(10) instructions in the time it took to service a LLC miss. Today, it's more like O(2500). That means it is far more important to be predictable than it is to execute the minimum number of instructions. If your B-tree nodes are 256kB in size (I believe that's what bacachefs is using?), then Eytzinger layout may make some sense, but if you're using smaller nodes (which I further believe is appropriate for an in-memory B-tree), then a straight 'load each index and compare it' may outperform a search that jumps around inside a node. I'm pontificating and will happily yield to someone who has data. I've tried to mark my assumptions clearly above. Something else I'm not aware of (and filesystem B-trees will not have any experience of) is what research exists on efficiently adding new entries to a balanced tree so as to minimise rebalances. Filesystems are like the Maple Tree in that for every logical offset inside a file, there is precisely one answer to "what physical block does this map to". For the i_mmap tree, we want instead to answer the question "Which VMAs have an intersection with this range of the file", and for the benchmark in question, we will have a large number of entries that compare equal to each other (they have the same start, and same end, but different values). So they could be inserted at many different points in the tree. We'd like to choose the point which causes the least amount of rebalancing.