Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cc+: Helge, parisc ML

We're chasing a weird failure which has been tracked down to the
placement of the division library functions (I assume they are imported
from libgcc).

See the thread starting at:

  https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx

On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
> On 8/6/24 19:33, Thomas Gleixner wrote:
>> 
>> So this change adds 16 bytes to __softirq() which moves the division
>> functions up by 16 bytes. That's all it takes to make the stupid go
>> away....
>
> Heh I was actually wondering if the division is somhow messed up because
> maxobj = order_objects() and order_objects() does a division. Now I suspect
> it even more.

check_slab() calls into that muck, but I checked the disassembly of a
working and a broken kernel and the only difference there is the
displacement offset when the code calculates the call address, but
that's as expected a difference of 16 bytes.

Now it becomes interesting.

I added a unused function after __do_softirq() into the softirq text
section and filled it with ASM nonsense so that it occupies exactly one
page. That moves $$divoI, which is what check_slab() calls, exactly one
page forward:

    -0000000041218c70 T $$divoI
    +0000000041219c70 T $$divoI

Guess what happens? If falls on it's nose again.

Now with that ASM gunk I can steer the size conveniently. It works up
to:

    0000000041219c50 T $$divoI

and fails for

    0000000041219c60 T $$divoI
    0000000041219c70 T $$divoI

and works again at

    0000000041219c80 T $$divoI

So I added the following:

+extern void testme(void);
+extern unsigned int testsize;
+
+unsigned int testsize = 192;
+
+void __init testme(void)
+{
+	pr_info("TESTME: %lu\n", PAGE_SIZE / testsize);
+}

called that _before_ mm_core_init() from init/main.c and adjusted my ASM
hack to make $$divoI be at:

    0000000041219c70 T $$divoI

again and surprisingly the output is:

    [    0.000000] softirq: TESTME: 21

Now I went back to the hppa64 gcc version 12.2.0 again and did the same
ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 in the
page and the same happens.

So it's not a compiler dependent problem.

But then I added a testme() call to the error path and get:

[    0.000000] softirq: TESTME: 21
[    0.000000] =============================================================================
[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0

Now what's wrong?

Adding more debug:

[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 21

where the last '21' is the output of the same call which made maxobj go
south:

 static int check_slab(struct kmem_cache *s, struct slab *slab)
 {
 	int maxobj;
@@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache
 
 	maxobj = order_objects(slab_order(slab), s->size);
 	if (slab->objects > maxobj) {
-		slab_err(s, slab, "objects %u > max %u",
-			slab->objects, maxobj);
+		testme();
+		slab_err(s, slab, "objects %u > max %u size %u sorder %u %u",
+			 slab->objects, maxobj, s->size, slab_order(slab),
+			 order_objects(slab_order(slab), s->size));
 		return 0;
 	}
 	if (slab->inuse > slab->objects) {

I don't know and I don't want to know TBH...

Thanks,

        tglx




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux