On 8/6/24 04:40, Linus Torvalds wrote: > [ Let's drop random people and bring in Vlastimil ] tglx was reproducing it so I add him back > Vlastimil, > it turns out that the "this patch" is entirely a red herring, and the > problem comes and goes randomly with just some code layout issues. See > > http://server.roeck-us.net/qemu/parisc64-6.10.3/ > > for more detail, particularly you'll see the "log.bad.gz" with the full log. [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 [ 0.000000] Slab 0x0000000041ed0000 objects=21 used=5 fp=0x00000000434003d0 flags=0x200(workingset|section=0|zone=0) flags tell us this came from the partial list (workingset), there's no head flag so order-0 since the error was detected it basically throws the slab page away and tries another one [ 0.000000] BUG kmem_cache (Tainted: G B ): objects 25 > max 16 [ 0.000000] Slab 0x0000000041ed0080 objects=25 used=6 fp=0x0000000043402790 flags=0x240(workingset|head|section=0|zone=0) this was also from the partial list but head flag so at least order-1, two things are weird: - max=16 is same as above even though it should be at least double as slab page's order is larger - objects=25 also isn't at least twice than objects=21 All the following are: [ 0.000000] BUG kmem_cache (Tainted: G B ): objects 25 > max 16 [ 0.000000] Slab 0x0000000041ed0300 objects=25 used=1 fp=0x000000004340c150 flags=0x40(head|section=0|zone=0) we depleted the partial list so it's allocating new slab pages, that are also at least order-1 It looks like maxobj calculation is bogus, would be useful to see what values it calculates from. I'm attaching a diff, but maybe it will also hide the issue... If someone has a /proc/slabinfo from a working boot with otherwise same config it might be also enough to guess what values should be expected there, at least the s-size. objects=21 vs 25 also seem odd though used=5 with used=6 in the first two also suggests we already passed this code successfully for creating a number of kmalloc caches and only then it started failing, that's also weird. > See also > > https://lore.kernel.org/all/87y15a4p4h.ffs@tglx/ > > for this thread. > > I don't think this is really a slub issue, since it only happens on > parisc, but maybe you can see what would make parisc different, and > what could possibly make it all timing- or layout-dependent. > > Linus > > On Sun, 4 Aug 2024 at 11:36, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: >> >> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages >> >> [ 0.000000] ============================================================================= >> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >> [ 0.000000] ----------------------------------------------------------------------------- >> >> This never stops until the emulation aborts. diff --git a/mm/slub.c b/mm/slub.c index 4927edec6a8c..ec4ed5215f2f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1386,8 +1386,8 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) maxobj = order_objects(slab_order(slab), s->size); if (slab->objects > maxobj) { - slab_err(s, slab, "objects %u > max %u", - slab->objects, maxobj); + slab_err(s, slab, "objects %u > max %u (order %d size %u)", + slab->objects, maxobj, slab_order(slab), s->size); return 0; } if (slab->inuse > slab->objects) {