On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > From: Waiman Long <longman@xxxxxxxxxx> > > The dlock list needs one list for each of the CPUs available. However, > for sibling CPUs, they are sharing the L2 and probably L1 caches > too. As a result, there is not much to gain in term of avoiding > cacheline contention while increasing the cacheline footprint of the > L1/L2 caches as separate lists may need to be in the cache. > > This patch makes all the sibling CPUs share the same list, thus > reducing the number of lists that need to be maintained in each > dlock list without having any noticeable impact on performance. It > also improves dlock list iteration performance as fewer lists need > to be iterated. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > Reviewed-by: Jan Kara <jack@xxxxxxx> We badly need this done in a more generic way. Besides shared caches, I've done a bunch of percpu algorithms where "amount of x stranded on percpu lists" is a major consideration and this would be preferable over percpu lists (including in fs/aio.c).