On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote: > Cc+: Helge, parisc ML > > We're chasing a weird failure which has been tracked down to the > placement of the division library functions (I assume they are > imported > from libgcc). > > See the thread starting at: > > > https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx > > On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: > > On 8/6/24 19:33, Thomas Gleixner wrote: > > > > > > So this change adds 16 bytes to __softirq() which moves the > > > division > > > functions up by 16 bytes. That's all it takes to make the stupid > > > go > > > away.... > > > > Heh I was actually wondering if the division is somhow messed up > > because > > maxobj = order_objects() and order_objects() does a division. Now I > > suspect > > it even more. > > check_slab() calls into that muck, but I checked the disassembly of a > working and a broken kernel and the only difference there is the > displacement offset when the code calculates the call address, but > that's as expected a difference of 16 bytes. > > Now it becomes interesting. > > I added a unused function after __do_softirq() into the softirq text > section and filled it with ASM nonsense so that it occupies exactly > one > page. That moves $$divoI, which is what check_slab() calls, exactly > one > page forward: > > -0000000041218c70 T $$divoI > +0000000041219c70 T $$divoI > > Guess what happens? If falls on it's nose again. > > Now with that ASM gunk I can steer the size conveniently. It works up > to: > > 0000000041219c50 T $$divoI > > and fails for > > 0000000041219c60 T $$divoI > 0000000041219c70 T $$divoI > > and works again at > > 0000000041219c80 T $$divoI So just on this, you seem to have proved that only exact multiples of 48 work. In terms of how PA-RISC caching works that's completely nuts ... however, there may be something else at work, like stack frame alignment. > > So I added the following: > > +extern void testme(void); > +extern unsigned int testsize; > + > +unsigned int testsize = 192; > + > +void __init testme(void) > +{ > + pr_info("TESTME: %lu\n", PAGE_SIZE / testsize); > +} > > called that _before_ mm_core_init() from init/main.c and adjusted my > ASM hack to make $$divoI be at: > > 0000000041219c70 T $$divoI > > again and surprisingly the output is: > > [ 0.000000] softirq: TESTME: 21 OK, why is that surprising? 4096/192 is 21 due to integer rounding. > Now I went back to the hppa64 gcc version 12.2.0 again and did the > same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 > in the page and the same happens. > > So it's not a compiler dependent problem. > > But then I added a testme() call to the error path and get: > > [ 0.000000] softirq: TESTME: 21 > [ 0.000000] > ===================================================================== > ======== > [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 > size 192 sorder 0 > > Now what's wrong? > > Adding more debug: > > [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 > size 192 sorder 0 21 > > where the last '21' is the output of the same call which made maxobj > go > south: > > static int check_slab(struct kmem_cache *s, struct slab *slab) > { > int maxobj; > @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache > > maxobj = order_objects(slab_order(slab), s->size); > if (slab->objects > maxobj) { > - slab_err(s, slab, "objects %u > max %u", > - slab->objects, maxobj); > + testme(); > + slab_err(s, slab, "objects %u > max %u size %u sorder > %u %u", > + slab->objects, maxobj, s->size, > slab_order(slab), > + order_objects(slab_order(slab), s->size)); > return 0; > } > if (slab->inuse > slab->objects) { > > I don't know and I don't want to know TBH... OK, so you're telling us we have a problem with slab_order on parisc ... that's folio_order, so it smells like a parisc bug with folio_test_large? Unfortuntely I'm a bit pissed in an airport lounge on my way to the UK, so I've lost access to my pa test rig and can't test further for a while. Regards, James