Hello, On Wed, 01 Apr 2020, Andrew Morton wrote: > From: Kees Cook <keescook@xxxxxxxxxxxx> > Subject: slub: relocate freelist pointer to middle of object > > In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it > became clear that moving the freelist pointer away from the edge of > allocations would likely improve the overall defensive posture of the > inline freelist pointer. My benchmarks show no meaningful change to > performance (they seem to show it being faster), so this looks like a > reasonable change to make. > > Instead of having the freelist pointer at the very beginning of an > allocation (offset 0) or at the very end of an allocation (effectively > offset -sizeof(void *) from the next allocation), move it away from the > edges of the allocation and into the middle. This provides some > protection against small-sized neighboring overflows (or underflows), for > which the freelist pointer is commonly the target. (Large or well > controlled overwrites are much more likely to attack live object contents, > instead of attempting freelist corruption.) > > The vaunted kernel build benchmark, across 5 runs. Before: > > Mean: 250.05 > Std Dev: 1.85 > > and after, which appears mysteriously faster: > > Mean: 247.13 > Std Dev: 0.76 > > Attempts at running "sysbench --test=memory" show the change to be well in > the noise (sysbench seems to be pretty unstable here -- it's not really > measuring allocation). > > Hackbench is more allocation-heavy, and while the std dev is above the > difference, it looks like may manifest as an improvement as well: > > 20 runs of "hackbench -g 20 -l 1000", before: > > Mean: 36.322 > Std Dev: 0.577 > > and after: > > Mean: 36.056 > Std Dev: 0.598 > > [1] https://twitter.com/vnik5287/status/1235113523098685440 > > Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescook > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> > Acked-by: Christoph Lameter <cl@xxxxxxxxx> > Cc: Vitaly Nikolenko <vnik@xxxxxxxxxxx> > Cc: Silvio Cesare <silvio.cesare@xxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx>Cc: Pekka Enberg <penberg@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/slub.c | 7 +++++++ > 1 file changed, 7 insertions(+) > With kernel v5.7-rc1 I am unable to boot when using the SLUB allocator and red zoning (slub_debug=Z), but otherwise a default config. Bisect points to this patch, and when reverting it, the kernel boots again. Splat: [...] [ 0.328713] rcu: Hierarchical RCU implementation. [ 0.329169] rcu: RCU event tracing is enabled. [ 0.329611] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8. [ 0.330251] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. [ 0.330984] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.332130] NR_IRQS: 4352, nr_irqs: 488, preallocated irqs: 16 [ 0.332713] general protection fault, probably for non-canonical address 0xccccccccccccccd4: 0000 [#1] SMP PTI [ 0.333680] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc1+ #3 [ 0.334280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 0.335079] RIP: 0010:deactivate_slab.isra.0+0x5b/0x460 [ 0.335582] Code: 48 8b b4 c7 e0 00 00 00 49 8b 44 24 20 31 ff 48 85 c0 40 0f 95 c7 83 c7 0f 89 7c 24 18 48 85 d2 0f 84 a0 00 00 00 41 8b 4e 20 <48> 8b 3c 0b 48 85 ff 0f 84 8c 00 00 00 49 8b 54 24 28 48 89 04 0b [ 0.337385] RSP: 0000:ffffffffb7e03c80 EFLAGS: 00010086 [ 0.337907] RAX: 0000000000000000 RBX: cccccccccccccccc RCX: 0000000000000008 [ 0.338688] RDX: cccccccccccccccc RSI: ffff91241c800f40 RDI: 000000000000000f [ 0.339473] RBP: ffffffffb7e03d20 R08: ffff91241fc2d230 R09: 0000000000000000 [ 0.340256] R10: ffff91241c89c010 R11: 0000000000000000 R12: ffffcf2f20722700 [ 0.341041] R13: cccccccccccccccc R14: ffff91241c802180 R15: ffffcf2f20722700 [ 0.341833] FS: 0000000000000000(0000) GS:ffff91241fc00000(0000) knlGS:0000000000000000 [ 0.342727] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.343359] CR2: ffff911e74e01000 CR3: 000000027460a001 CR4: 00000000000606b0 [ 0.344146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.344929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.345727] Call Trace: [ 0.345999] ? setup_object_debug.isra.0+0x1d/0x40 [ 0.346525] ? new_slab+0x195/0x340 [ 0.346909] ? init_object+0x2f/0x80 [ 0.347305] ___slab_alloc+0x526/0x570 [ 0.347717] ? kasprintf+0x4e/0x70 [ 0.348092] ? init_object+0x2f/0x80 [ 0.348488] ? string+0x42/0x50 [ 0.348834] ? kasprintf+0x4e/0x70 [ 0.349217] __kmalloc_track_caller+0x1d2/0x200 [ 0.349720] kvasprintf+0x64/0xc0 [ 0.350085] kasprintf+0x4e/0x70 [ 0.350442] ? kmem_cache_alloc_trace+0x188/0x1b0 [ 0.350962] __irq_domain_alloc_fwnode+0x8f/0xd0 [ 0.351474] arch_early_irq_init+0x16/0x90 [ 0.351923] start_kernel+0x2aa/0x4c2 [ 0.352325] secondary_startup_64+0xb6/0xc0 [ 0.352784] Modules linked in: [ 0.353124] random: get_random_bytes called from print_oops_end_marker+0x21/0x40 with crng_init=0 [ 0.353126] ---[ end trace 186486c23e10986d ]--- [ 0.354613] RIP: 0010:deactivate_slab.isra.0+0x5b/0x460 [ 0.355186] Code: 48 8b b4 c7 e0 00 00 00 49 8b 44 24 20 31 ff 48 85 c0 40 0f 95 c7 83 c7 0f 89 7c 24 18 48 85 d2 0f 84 a0 00 00 00 41 8b 4e 20 <48> 8b 3c 0b 48 85 ff 0f 84 8c 00 00 00 49 8b 54 24 28 48 89 04 0b [ 0.357255] RSP: 0000:ffffffffb7e03c80 EFLAGS: 00010086 [ 0.357829] RAX: 0000000000000000 RBX: cccccccccccccccc RCX: 0000000000000008 [ 0.358613] RDX: cccccccccccccccc RSI: ffff91241c800f40 RDI: 000000000000000f [ 0.359398] RBP: ffffffffb7e03d20 R08: ffff91241fc2d230 R09: 0000000000000000 [ 0.360181] R10: ffff91241c89c010 R11: 0000000000000000 R12: ffffcf2f20722700 [ 0.360965] R13: cccccccccccccccc R14: ffff91241c802180 R15: ffffcf2f20722700 [ 0.361755] FS: 0000000000000000(0000) GS:ffff91241fc00000(0000) knlGS:0000000000000000 [ 0.362645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.363275] CR2: ffff911e74e01000 CR3: 000000027460a001 CR4: 00000000000606b0 [ 0.364060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.364844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.365636] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.366393] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Can you reproduce this? Let me know if you need more information. Thanks, -- Marco > --- a/mm/slub.c~slub-relocate-freelist-pointer-to-middle-of-object > +++ a/mm/slub.c > @@ -3581,6 +3581,13 @@ static int calculate_sizes(struct kmem_c > */ > s->offset = size; > size += sizeof(void *); > + } else if (size > sizeof(void *)) { > + /* > + * Store freelist pointer near middle of object to keep > + * it away from the edges of the object to avoid small > + * sized over/underflows from neighboring allocations. > + */ > + s->offset = ALIGN(size / 2, sizeof(void *)); > } > > #ifdef CONFIG_SLUB_DEBUG > _