Cc+: Helge, parisc ML We're chasing a weird failure which has been tracked down to the placement of the division library functions (I assume they are imported from libgcc). See the thread starting at: https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: > On 8/6/24 19:33, Thomas Gleixner wrote: >> >> So this change adds 16 bytes to __softirq() which moves the division >> functions up by 16 bytes. That's all it takes to make the stupid go >> away.... > > Heh I was actually wondering if the division is somhow messed up because > maxobj = order_objects() and order_objects() does a division. Now I suspect > it even more. check_slab() calls into that muck, but I checked the disassembly of a working and a broken kernel and the only difference there is the displacement offset when the code calculates the call address, but that's as expected a difference of 16 bytes. Now it becomes interesting. I added a unused function after __do_softirq() into the softirq text section and filled it with ASM nonsense so that it occupies exactly one page. That moves $$divoI, which is what check_slab() calls, exactly one page forward: -0000000041218c70 T $$divoI +0000000041219c70 T $$divoI Guess what happens? If falls on it's nose again. Now with that ASM gunk I can steer the size conveniently. It works up to: 0000000041219c50 T $$divoI and fails for 0000000041219c60 T $$divoI 0000000041219c70 T $$divoI and works again at 0000000041219c80 T $$divoI So I added the following: +extern void testme(void); +extern unsigned int testsize; + +unsigned int testsize = 192; + +void __init testme(void) +{ + pr_info("TESTME: %lu\n", PAGE_SIZE / testsize); +} called that _before_ mm_core_init() from init/main.c and adjusted my ASM hack to make $$divoI be at: 0000000041219c70 T $$divoI again and surprisingly the output is: [ 0.000000] softirq: TESTME: 21 Now I went back to the hppa64 gcc version 12.2.0 again and did the same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 in the page and the same happens. So it's not a compiler dependent problem. But then I added a testme() call to the error path and get: [ 0.000000] softirq: TESTME: 21 [ 0.000000] ============================================================================= [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 Now what's wrong? Adding more debug: [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 21 where the last '21' is the output of the same call which made maxobj go south: static int check_slab(struct kmem_cache *s, struct slab *slab) { int maxobj; @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache maxobj = order_objects(slab_order(slab), s->size); if (slab->objects > maxobj) { - slab_err(s, slab, "objects %u > max %u", - slab->objects, maxobj); + testme(); + slab_err(s, slab, "objects %u > max %u size %u sorder %u %u", + slab->objects, maxobj, s->size, slab_order(slab), + order_objects(slab_order(slab), s->size)); return 0; } if (slab->inuse > slab->objects) { I don't know and I don't want to know TBH... Thanks, tglx