On Mon, Feb 05, 2024 at 02:22:18PM +1100, Dave Chinner wrote: > Intuition tells me that what the OP is seeing is the opposite case > to above: there is significant contention on the lock. In that case, > optimal "contention performance" comes from separating the lock and > the objects it protects into different cachelines. > > The reason for this is that if the lock and objects it protects are > on the same cacheline, lock contention affects both the lock and the > objects being manipulated inside the critical section. i.e. attempts > to grab the lock pull the cacheline away from the CPU that holds the > lock, and then accesses to the object that are protected by the lock > then have to pull the cacheline back. > > i.e. the cost of the extra memory fetch from an uncontended > cacheline is less than the cost of having to repeatedly fetch the > memory inside a critical section on a contended cacheline. > > I consider optimisation attempts like this the canary in the mine: > it won't be long before these or similar workloads report > catastrophic lock contention on the lock in question. Moving items > in the structure is equivalent to re-arranging the deck chairs > whilst the ship sinks - we might keep our heads above water a > little longer, but the ship is still sinking and we're still going > to have to fix the leak sooner rather than later... So the fundamental problem is our data structure. It's actually two problems wrapped up in one bad idea. i_mmap is a struct rb_root_cached: struct rb_root_cached { struct rb_root rb_root; struct rb_node *rb_leftmost; }; struct rb_root { struct rb_node *rb_node; }; so it's two pointers into the tree; one to the leftmost node, one to the topmost node. That means we're modifying one or both of these pointers frequently. I imagine it's the rb_root, which is being modified frequently because we're always ... appending to the right? And so we're rotating frequently. Worst case would be if we're appending to the left and modifying both pointers. There are things we could do to address this by making rotations less frequent for the common case, which I suspect is mapping the entire file. And perhaps we should do these things as a stopgap. The larger problem is that rbtrees suck. They suck the same way that linked lists suck; the CPU can't prefetch very far ahead (or in this case down). It can do a little more work in that it can prefetch both the left & right pointers, but it can't fetch the grandchildren until the children fetches have finished, so the dependent reads have us stumped. The solution to this problem is to change the interval tree data structure from an Red-Black tree to a B-tree, or something similar where we use an array of pointers instead of a single pointer. Not the Maple Tree; that is designed for non-overlapping ranges. One could take inspiration from the Maple Tree and design a very similar data structure that can store and iterate over overlapping ranges. I can't understand the macros this late at night, so I don't fully understand what the interval tree is doing, but I can probably make a more fully informed observation by the end of the week.