On Tue, Jan 14, 2025 at 07:55:30AM +1100, Dave Chinner wrote: > The idea behind the initial cacheline layout is that it should stay > read-only as much as possible so that cache lookups can walk the > buffer without causing shared/exclusive cacheline contention with > existing buffer users. > > This was really important back in the days when the cache used a > rb-tree (i.e. the rbnode pointers dominated lookup profiles), and > it's still likely important with the rhashtable on large caches. > > i.e. Putting a spinlock in that first cache line will result in > lookups and shrinker walks having cacheline contention as the > shrinker needs exclusive access for the spin lock, whilst the lookup > walk needs shared access for the b_rhash_head, b_rhash_key and > b_length fields in _xfs_buf_obj_cmp() for lookless lookup > concurrency. Hmm, this contradict the comment on top of xfs_buf, which explicitly wants the lock and count in the semaphore to stay in the first cache line. These, similar to the count that already is in the cacheline and the newly moved lock (which would still keep the semaphore partial layout) are modified for the uncontended lookup there. Note that since the comment was written b_sema actually moved entirely into the first cache line, and this patch keeps it there, nicely aligning b_lru_ref on my x86_64 no-debug config. Now I'm usually pretty bad about these cacheline micro-optimizations and I'm talking to the person who wrote that comment here, so that rationale might not make sense, but then the comment doesn't either. I'm kinda tempted to just stick to the rationale there for now and then let someone smarter than me optimize the layout for the new world order.