On Thu, Nov 10, 2022 at 03:31:31PM +0800, Vlastimil Babka wrote: > On 11/10/22 05:40, Theodore Ts'o wrote: > > On Thu, Nov 10, 2022 at 01:48:32AM +0200, Aaro Koskinen wrote: > >> > >> Some of the reported SLOB issues have been actually real driver bugs, > >> that go unnoticed when SLUB/SLAB are used (unless perhaps debug stuff > >> is enabled). I'm not saying kernel should keep SLOB, but it's good at > >> failing early when there is a bug. See e.g. commit 120ee599b5bf ("staging: > >> octeon-usb: prevent memory corruption") > > > > Out of curiosity, are these bugs that would have been found using > > KASAN or some of the other kernel sanitizers and/or other debugging > > tools we have at our disposal? > > Hopefully slub_debug redzoning would be able to trigger the bug described in > commit 120ee599b5bf above, which is: > > > octeon-hcd will crash the kernel when SLOB is used. This usually happens > > after the 18-byte control transfer when a device descriptor is read. > > The DMA engine is always transfering full 32-bit words and if the > > transfer is shorter, some random garbage appears after the buffer. > > The problem is not visible with SLUB since it rounds up the allocations > > to word boundary, and the extra bytes will go undetected. > > Ah, actually it wouldn't *now* as SLUB would make the allocation fall into > kmalloc-32 cache and only add redzone beyond 32 bytes. But with upcoming > changes by Feng Tang, this should work. I wrote a simple case trying simulating this: static noinline void dma_align_test(void) { char *buf; buf = kmalloc(18, GFP_KERNEL); buf[18] = 0; buf[19] = 0; kfree(buf); } And with slub_debug on and the slub_redzone patchset[1], it did catch the out-of-bound access, as in the dmesg: " ============================================================================= BUG kmalloc-32 (Not tainted): kmalloc Redzone overwritten ----------------------------------------------------------------------------- 0xffff888005ebb032-0xffff888005ebb033 @offset=50. First byte 0x0 instead of 0xcc Allocated in dma_align_test+0x1b/0x29 age=6554 cpu=1 pid=1 __kmem_cache_alloc_node+0x2a7/0x320 kmalloc_trace+0x27/0xa0 dma_align_test+0x1b/0x29 late_slub_debug+0xa/0x11 do_one_initcall+0x87/0x2a0 ... Slab 0xffffea000017aec0 objects=21 used=19 fp=0xffff888005ebbf20 flags=0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff) Object 0xffff888005ebb020 @offset=32 fp=0x0000000000000000 Redzone ffff888005ebb000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Redzone ffff888005ebb010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object ffff888005ebb020: 50 92 28 00 81 88 ff ff 01 00 00 00 11 00 b6 07 P.(............. Object ffff888005ebb030: 6b a5 00 00 cc cc cc cc cc cc cc cc cc cc cc cc k............... Redzone ffff888005ebb040: cc cc cc cc cc cc cc cc ........ Padding ffff888005ebb0a4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ Padding ffff888005ebb0b4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZ " [1]. https://lore.kernel.org/lkml/20221021032405.1825078-1-feng.tang@xxxxxxxxx/ Thanks, Feng > slub_debug would also have a chance of catching buffer overflows by kernel > code itself, not DMA, and tell you about it more sooner and gracefully than > crashing. KASAN also, even with a higher chance and precision, if it's > available for your arch and your device constraints can tolerate its larger > overhead. > > > - Ted > > > > > >