Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

Guenter Roeck <linux@xxxxxxxxxxxx> · Thu, 8 Aug 2024 07:46:13 -0700

On 8/8/24 00:48, Vlastimil Babka wrote:
On 8/8/24 03:07, Guenter Roeck wrote:
On 8/6/24 16:24, Thomas Gleixner wrote:
Cc+: Helge, parisc ML

We're chasing a weird failure which has been tracked down to the
placement of the division library functions (I assume they are imported
from libgcc).

See the thread starting at:

    https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx

On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
On 8/6/24 19:33, Thomas Gleixner wrote:

So this change adds 16 bytes to __softirq() which moves the division
functions up by 16 bytes. That's all it takes to make the stupid go
away....

Heh I was actually wondering if the division is somhow messed up because
maxobj = order_objects() and order_objects() does a division. Now I suspect
it even more.

check_slab() calls into that muck, but I checked the disassembly of a
working and a broken kernel and the only difference there is the
displacement offset when the code calculates the call address, but
that's as expected a difference of 16 bytes.

Now it becomes interesting.

I added a unused function after __do_softirq() into the softirq text
section and filled it with ASM nonsense so that it occupies exactly one
page. That moves $$divoI, which is what check_slab() calls, exactly one
page forward:


With the above added to my tree, I can also play around with the code.
Here is the next weird one:

diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..b8a33966d858 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
          }

          maxobj = order_objects(slab_order(slab), s->size);
+
+       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
+
          if (slab->objects > maxobj) {
                  slab_err(s, slab, "objects %u > max %u",
                          slab->objects, maxobj);

results in:

##### slab->objects=21 maxobj=21
=============================================================================
BUG kmem_cache_node (Not tainted): objects 21 > max 16

But is this printed from the same attempt? The pr_info_once() might have
printed earlier and then stopped (as it's _once) and the error case might
have happened only later, and there was nothing printed in between as the
kmalloc caches are created in a loop.


No, of course it isn't. Guess it was too late. Sorry for the noise.
Here is the updated log, after dropping the _once:

...
[    0.000000] ##### slab->objects=21 maxobj=21
[    0.000000] ##### slab->objects=25 maxobj=25
[    0.000000] ##### slab->objects=21 maxobj=16
[    0.000000] =============================================================================
[    0.000000] BUG kmalloc-256 (Not tainted): objects 21 > max 16

So this works many times and then suddenly fails. I thought it was
the other way, that it failed the very first time. Ok, back to debugging.

Thanks!
Guenter