Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 8 Aug 2024 08:53:52 -0700

On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Careful vs. the pr_once(). It's not necessarily the first allocation
> which trips up. I removed slab_err() in that condition and just printed
> the data:
>
> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14
> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 18
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 19
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 20
> [    0.000000] Order: 0 Size:  160 Nobj: 25 Maxobj: 16 25 Inuse:  5
> [    0.000000] Order: 2 Size:  672 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  2
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10
>
> The maxobj column shows the failed result and the result from the second
> invocation inside of the printk().

Hmm. There's a few patterns there:

 - the incorrect Maxobj is always 16, with wildly different sizes.

 - the correct value is always in that 21-25 range

and neither of these are particularly common cases for slab objects
(well, at least on x86-64).

I actually went into the gcc sources to look at the libgcc routines
for the hppa $$divU routine, but apart from checking for trivial
powers-of-two and for divisions with small divisor values (<=17), all
it is ends up being a series of "ds" (divide step) and "addc"
instructions. I don't see how that could possibly mess up. It does end
up with the final addc in the delay slot of the return, but that's
normal parisc behavior (and here by "normal" I mean "it's a really
messed up instruction set that did everything wrong, including branch
delay slots")

I do note that the $$divU function (which is what this all should use)
oddly doesn't show up as defined in 'nm' for me when I look at
Guenter's vmlinux file. So there's some odd linker thing going on, and
it *only* affects the $$div* functions.

Thomas' System.map shows some of the same effects, ie it shows $$divoI
(signed integer divide with overflow checking), but doesn't show
$$divU that is right after it. The reason I was looking was exactly
because this should be using $$divU, and clearly code alignment is
implicated somehow, but the exact alignment of $$divU wasn't obvious.

But it looks like "$$divU" should be somewhere between $$divoI and
$$divl_2, and in Guenter's bad case that's

  0000000041218c70 T $$divoI
  00000000412190d0 T $$divI_2

so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
into 0000000041219000?

Some ITLB fill issue together with that delayed branch and a qemu bug?

                Linus