On 8/8/24 03:07, Guenter Roeck wrote: > On 8/6/24 16:24, Thomas Gleixner wrote: >> Cc+: Helge, parisc ML >> >> We're chasing a weird failure which has been tracked down to the >> placement of the division library functions (I assume they are imported >> from libgcc). >> >> See the thread starting at: >> >> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx >> >> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: >>> On 8/6/24 19:33, Thomas Gleixner wrote: >>>> >>>> So this change adds 16 bytes to __softirq() which moves the division >>>> functions up by 16 bytes. That's all it takes to make the stupid go >>>> away.... >>> >>> Heh I was actually wondering if the division is somhow messed up because >>> maxobj = order_objects() and order_objects() does a division. Now I suspect >>> it even more. >> >> check_slab() calls into that muck, but I checked the disassembly of a >> working and a broken kernel and the only difference there is the >> displacement offset when the code calculates the call address, but >> that's as expected a difference of 16 bytes. >> >> Now it becomes interesting. >> >> I added a unused function after __do_softirq() into the softirq text >> section and filled it with ASM nonsense so that it occupies exactly one >> page. That moves $$divoI, which is what check_slab() calls, exactly one >> page forward: >> > > With the above added to my tree, I can also play around with the code. > Here is the next weird one: > > diff --git a/mm/slub.c b/mm/slub.c > index 4927edec6a8c..b8a33966d858 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) > } > > maxobj = order_objects(slab_order(slab), s->size); > + > + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); > + > if (slab->objects > maxobj) { > slab_err(s, slab, "objects %u > max %u", > slab->objects, maxobj); > > results in: > > ##### slab->objects=21 maxobj=21 > ============================================================================= > BUG kmem_cache_node (Not tainted): objects 21 > max 16 But is this printed from the same attempt? The pr_info_once() might have printed earlier and then stopped (as it's _once) and the error case might have happened only later, and there was nothing printed in between as the kmalloc caches are created in a loop. > As Thomas noticed, this only happens if the divide assembler code is within a certain > address range. > > Ok, now I am really lost. > > Guenter >