On 8/8/24 00:48, Vlastimil Babka wrote:
On 8/8/24 03:07, Guenter Roeck wrote:
On 8/6/24 16:24, Thomas Gleixner wrote:
Cc+: Helge, parisc ML
We're chasing a weird failure which has been tracked down to the
placement of the division library functions (I assume they are imported
from libgcc).
See the thread starting at:
https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx
On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
On 8/6/24 19:33, Thomas Gleixner wrote:
So this change adds 16 bytes to __softirq() which moves the division
functions up by 16 bytes. That's all it takes to make the stupid go
away....
Heh I was actually wondering if the division is somhow messed up because
maxobj = order_objects() and order_objects() does a division. Now I suspect
it even more.
check_slab() calls into that muck, but I checked the disassembly of a
working and a broken kernel and the only difference there is the
displacement offset when the code calculates the call address, but
that's as expected a difference of 16 bytes.
Now it becomes interesting.
I added a unused function after __do_softirq() into the softirq text
section and filled it with ASM nonsense so that it occupies exactly one
page. That moves $$divoI, which is what check_slab() calls, exactly one
page forward:
With the above added to my tree, I can also play around with the code.
Here is the next weird one:
diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..b8a33966d858 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
}
maxobj = order_objects(slab_order(slab), s->size);
+
+ pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
+
if (slab->objects > maxobj) {
slab_err(s, slab, "objects %u > max %u",
slab->objects, maxobj);
results in:
##### slab->objects=21 maxobj=21
=============================================================================
BUG kmem_cache_node (Not tainted): objects 21 > max 16
But is this printed from the same attempt? The pr_info_once() might have
printed earlier and then stopped (as it's _once) and the error case might
have happened only later, and there was nothing printed in between as the
kmalloc caches are created in a loop.
No, of course it isn't. Guess it was too late. Sorry for the noise.
Here is the updated log, after dropping the _once:
...
[ 0.000000] ##### slab->objects=21 maxobj=21
[ 0.000000] ##### slab->objects=25 maxobj=25
[ 0.000000] ##### slab->objects=21 maxobj=16
[ 0.000000] =============================================================================
[ 0.000000] BUG kmalloc-256 (Not tainted): objects 21 > max 16
So this works many times and then suddenly fails. I thought it was
the other way, that it failed the very first time. Ok, back to debugging.
Thanks!
Guenter