Hello Lianbo, I wanted to keep the story short, but here is the long version. This problem has been hiding for a long time due to a compiler alignment padding on 64bit machine. For example, on RHEL 8.0, “include/linux/slub_def.h?v=4.18.0-80”: 0101 void (*ctor)(void *); 0102 unsigned int inuse; /* Offset to metadata */ 0103 unsigned int align; /* Alignment */ 0104 unsigned int red_left_pad; /* Left redzone padding size */ 0105 const char *name; /* Name (only for display!) */ We have 3 “unsigned int” followed by “char*name” (64bit aligned), hence the compiler will add a 32bit padding after red_left_pad. With such padding (and a on little-endian machine, ie. x86_64), a bogus load_ULONG() instead of a load_UINT() can’t really be noticed if the padding is zero-filled. Now, on releases where we don’t have such compiler padding after red_left_pad, a bogus load_ULONG() instead of a load_UINT() will produce noticeable garbage. For example, on RHEL 7.8, “include/linux/slub_def.h?v=3.10.0-1127”: 0084 void (*ctor)(void *); 0085 int inuse; /* Offset to metadata */ 0086 int align; /* Alignment */ 0087 int reserved; /* Reserved bytes at the end of slabs */ 0088 RH_KABI_FILL_HOLE(int red_left_pad) /* Left redzone padding size
*/ 0089 const char *name; /* Name (only for display!) */ Here there won’t be any 32bit padding after red_left_pad as we have 4 “int” before “char*name”, so here a load_ULONG() instead of load_UINT() we will get bogus data. From a RHEL 7.8 crash dump, where the machine was booted with “slub_debug=FZP” (ie. Debugging all caches): crash> sys | grep REL RELEASE: 3.10.0-1127.el7.x86_64 crash> kmem -S names_cache CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff8838ffc2c040 4096 1 336 48 32k names_cache […] KMEM_CACHE_NODE NODE SLABS PARTIAL PER-CPU ffff88017fdcfcc0 0 6 6 0 NODE 0 PARTIAL: SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0021799000 ffff88085e640000 0 7 0 7 ffffea002179a600 ffff88085e698000 0 7 0 7 ffffea00217fce00 ffff88085ff38000 0 7 0 7 ffffea00216df000 ffff88085b7c0000 0 7 0 7 ffffea00216dc400 ffff88085b710000 0 7 0 7 ffffea0021799c00 ffff88085e670000 0 7 0 7 […] crash> struct kmem_cache ffff8838ffc2c040 | grep red red_left_pad = 64, crash> kmem -s ffff88085e640040 <<< trying first object, should be free CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff8838ffc2c040 4096 1 336 48 32k names_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0021799000 ffff88085e640000 0 7 0 7 FREE / [ALLOCATED] [ffff88085e640000] <<< getting expected red zone address, but not free !! crash> set redzone off redzone: off crash> kmem -s ffff88085e640040 <<< trying again with redzone off CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff8838ffc2c040 4096 1 336 48 32k names_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0021799000 ffff88085e640000 0 7 0 7 FREE / [ALLOCATED] [ffc128105e640040] <<< getting a bogus object address, still not free crash> rd -32 0xffff8838ffc2c09c ffff8838ffc2c09c: 00000040 @... crash> rd -64 0xffff8838ffc2c09c ffff8838ffc2c09c: ffc1a00800000040 @....... crash> p /x 0xffff88085e640000+0xffc1a00800000040 $1 = 0xffc128105e640040 Here above we can see that red_left_pad was read by crash as 0xffc1a00800000040 instead of 0x40 !! With the patch: crash> kmem -s ffff88085e640040 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff8838ffc2c040 4096 1 336 48 32k names_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0021799000 ffff88085e640000 0 7 0 7 FREE / [ALLOCATED] ffff88085e640000 <<< getting expected red zone address, and object free crash> set redzone off redzone: off crash> kmem -s ffff88085e640040 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff8838ffc2c040 4096 1 336 48 32k names_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0021799000 ffff88085e640000 0 7 0 7 FREE / [ALLOCATED] ffff88085e640040 <<< getting expected object address, and object free Now, with respect to: > And also please add the kernel commit to patch log: > kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int") I don’t agree with this suggestion, this problem has nothing to do with “int” versus “unsigned int”. It is only related to doing a load_ULONG() (64bit) instead of a load_INT() or load_UINT() (32bit). For some releases, where we have a 32bit padding after red_left_pad, the problem won’t be noticed. But without a 32bit padding after red_left_pad, crash will get a bogus red_left_pad causing all objects to be reported as [ALLOCATED], and it will print bogus object addresses when using “set redzone off”. Cheers, Georges From: lijiang <lijiang@xxxxxxxxxx>
Hi, Aureau Thank you for the fix. On Thu, Aug 29, 2024 at 5:56 PM <devel-request@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Can you help add the result of the 'kmem address' command here? We can clearly see what error it is. And also please add the kernel commit to patch log: kernel commit 2ca6d39b3102 ("slub: make ->red_left_pad unsigned int")
This change looks good to me, but I still have a question: I can not reproduce the current issue, how did you reproduce this one? Can you help list the steps to reproduce? Thanks Lianbo
|
-- Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki