On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > From: Waiman Long <longman@xxxxxxxxxx> > > The dlock list needs one list for each of the CPUs available. However, > for sibling CPUs, they are sharing the L2 and probably L1 caches > too. As a result, there is not much to gain in term of avoiding > cacheline contention while increasing the cacheline footprint of the > L1/L2 caches as separate lists may need to be in the cache. > > This patch makes all the sibling CPUs share the same list, thus > reducing the number of lists that need to be maintained in each > dlock list without having any noticeable impact on performance. It > also improves dlock list iteration performance as fewer lists need > to be iterated. Seems Waiman was missed on the CC it looks like there's some duplication of this with list_lru functionality - similar list-sharded-by-node idea. list_lru does the sharding by page_to_nid() of the item, which saves a pointer and allows just using a list_head in the item. OTOH, it's less granular than what dlock-list is doing? I think some attempt ought to be made to factor out the common ideas hear; perhaps reworking list_lru to use this thing, and I hope someone has looked at the page_nid idea vs. dlock_list using the current core. But it's nice and small, and I'd like to use it elsewhere. Reviewed-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>