----- Original Message ----- > OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> writes: > > OK. More simpler proof, the following is enough to convince you? No, I'm a believer. I could pretty much verify your count by looking at the task_struct slab cache. Interesting though, I was looking a slub corruption vmcore with your patch applied, where a kmem_cache_cpu.freepointer got corrupted because of a use-after-free bug that overwrote the next-free pointer in a free'd kmalloc-32 object. When that corrupted object was later allocated, its corrupted next-free pointer was transferred to the kmem_cache_cpu.freepointer. It gets reported like so: crash> kmem -s kmalloc-32 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE kmem: kmalloc-32: slab: 0 invalid freepointer: ffff001090e33f80 ffff880333001c00 kmalloc-32 32 122658 125440 980 4k crash> And here are the per-cpu kmem_cache_cpu structures, where the corrupted one is from cpu 3: crash> kmem_cache.cpu_slab ffff880333001c00 cpu_slab = 0x163c0 crash> kmem_cache_cpu 0x163c0:a [0]: ffff88033fc163c0 struct kmem_cache_cpu { freelist = 0xffff88031c028fa0, tid = 31034440, page = 0xffffea000c700a00, partial = 0xffffea000ca5d380 } [1]: ffff88033fc963c0 struct kmem_cache_cpu { freelist = 0xffff8802d44c91c0, tid = 28218351, page = 0xffffea000b513240, partial = 0x0 } [2]: ffff88033fd163c0 struct kmem_cache_cpu { freelist = 0xffff8802d442ba80, tid = 25768102, page = 0xffffea000b510ac0, partial = 0xffffea000c9bce40 } [3]: ffff88033fd963c0 struct kmem_cache_cpu { freelist = 0xffff001090e33f80, <== corrupted pointer tid = 26298247, page = 0xffffea0006438cc0, partial = 0xffffea0002ec8b80 } crash> But going back to the error report, the "slab: 0" is kind of confusing: crash> kmem -s kmalloc-32 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE kmem: kmalloc-32: slab: 0 invalid freepointer: ffff001090e33f80 ffff880333001c00 kmalloc-32 32 122658 125440 980 4k crash> Unlike do_slab_slub(), when get_kmem_cache_slub_data() calls count_free_objects(), si->slab is not set: switch (cmd) { case GET_SLUB_OBJECTS: if (!readmem(cpu_slab_ptr + OFFSET(page_inuse), KVADDR, &inuse, sizeof(short), "page inuse", RETURN_ON_ERROR)) return FALSE; objects = slub_page_objects(si, cpu_slab_ptr); if (!objects) return FALSE; free_objects += objects - inuse; free_objects += count_free_objects(si, cpu_freelist); free_objects += count_cpu_partial(si, i); if (!node_total_avail) total_objects += inuse; total_slabs++; break; And then count_free_objects() calls get_freepointer(), leading to the confusing error message. I'm thinking we should clarify that error message, perhaps by storing the cpu number in si->cpu, and displaying it when si->slab is NULL? Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility