On 2021-08-17 11:17:18 [+0200], Vlastimil Babka wrote: > On 8/17/21 11:12 AM, Sebastian Andrzej Siewior wrote: > > On 2021-08-17 10:37:48 [+0200], Vlastimil Babka wrote: > >> OK reproduced. Thanks, will investigate. > > > > With the local_lock at the top, the needed alignment gets broken for dbl > > cmpxchg. On RT it was working ;) > > I'd rather have page and partial in the same cacheline as well, is it ok > to just move the lock as last and not care about whether it straddles > cachelines or not? (with CONFIG_SLUB_CPU_PARTIAL it will naturally start > with the next cacheline). Moving like you suggested appears to be more efficient and saves a bit of memory: RT+ debug: struct kmem_cache_cpu { void * * freelist; /* 0 8 */ long unsigned int tid; /* 8 8 */ struct page * page; /* 16 8 */ struct page * partial; /* 24 8 */ local_lock_t lock; /* 32 144 */ /* size: 176, cachelines: 3, members: 5 */ /* last cacheline: 48 bytes */ }; RT no debug: struct kmem_cache_cpu { void * * freelist; /* 0 8 */ long unsigned int tid; /* 8 8 */ struct page * page; /* 16 8 */ struct page * partial; /* 24 8 */ local_lock_t lock; /* 32 32 */ /* size: 64, cachelines: 1, members: 5 */ }; no-RT, no-debug: struct kmem_cache_cpu { void * * freelist; /* 0 8 */ long unsigned int tid; /* 8 8 */ struct page * page; /* 16 8 */ struct page * partial; /* 24 8 */ local_lock_t lock; /* 32 0 */ /* size: 32, cachelines: 1, members: 5 */ /* last cacheline: 32 bytes */ }; no-RT, debug: struct kmem_cache_cpu { void * * freelist; /* 0 8 */ long unsigned int tid; /* 8 8 */ struct page * page; /* 16 8 */ struct page * partial; /* 24 8 */ local_lock_t lock; /* 32 56 */ /* size: 88, cachelines: 2, members: 5 */ /* last cacheline: 24 bytes */ }; Sebastian