On 3/3/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Mar 3, 2023 at 11:37 AM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: >> >> I mentioned in the previous e-mail that memset is used a lot even >> without the problematic opt and even have shown size distribution of >> what's getting passed there. > > Well, I *have* been pushing Intel to try to fix memcpy and memset for > over two decades by now, but with FSRM I haven't actually seen the > huge problems any more. > rep *stos* remains crap with FSRM, but I don't have sensible tests handy nor the ice lake cpu i tested on at the moment > I actually used to have the reverse of your hack for this - I've had > various hacks over the year that made memcpy and memset be inlined > "rep movs/stos", which (along with inlined spinlocks) is a great way > to see the _culprit_ (without having to deal with the call chains - > which always get done the wrong way around imnsho). > that's all hackery which makes sense pre tooling like bpftrace, people can do better now (see the second part of the email) I think there is a systemic problem which comes with the kzalloc API, consider: static struct file *__alloc_file(int flags, const struct cred *cred) { struct file *f; int error; f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL); if (unlikely(!f)) return ERR_PTR(-ENOMEM); [bunch of the struct gets initialized here] the allocation routine does not have any information about the size available at compilation time, so has to resort to a memset call at runtime. Instead, should this be: f = kmem_cache_alloc(...); memset(f, 0, sizeof(*f)); ... the compiler could in principle inititalize stuff as indicated by code and emit zerofill for the rest. Interestingly, last I checked neither clang nor gcc knew how to do it, they instead resort to a full sized memset anyway, which is quite a bummer. Personally i grew up on dtrace, bpftrace I can barely use and don't know how to specifically get the caller, but kstack(2) seems like a good enough workaround. as an example here is a one-liner to show crappers which do 0-sized ops: bpftrace -e 'kprobe:memset,kprobe:memcpy /arg2 == 0/ { @[probe, kstack(2)] = count(); }' one can trace all kinds of degeneracy like that without recompiling anything, provided funcs are visible to bpftrace sample result from the above one-liner while doing 'make clean' in the kernel dir: @[kprobe:memcpy, memcpy+5 realloc_array+78 ]: 1 @[kprobe:memcpy, memcpy+5 push_jmp_history+125 ]: 1 @[kprobe:memset, memset+5 blk_mq_dispatch_rq_list+687 ]: 3 @[kprobe:memcpy, memcpy+5 mix_interrupt_randomness+192 ]: 4 @[kprobe:memcpy, memcpy+5 d_alloc_pseudo+18 ]: 59 @[kprobe:memcpy, memcpy+5 add_device_randomness+111 ]: 241 @[kprobe:memcpy, memcpy+5 add_device_randomness+93 ]: 527 @[kprobe:memset, memset+5 copy_process+2904 ]: 2054 @[kprobe:memset, memset+5 dup_fd+283 ]: 6162 -- Mateusz Guzik <mjguzik gmail.com>