On Fri, 11 Aug 2023 at 10:07, David Howells <dhowells@xxxxxxxxxx> wrote: > > Hmmm... It seems that using if-if-if rather than switch() gets optimised > better in terms of .text space. The attached change makes things a bit > smaller (by 69 bytes). Ack, and that also makes your change look more like the original code and more as just a plain "turn macros into inline functions". As a result the code diff initially seems a bit smaller too, but then at some point it looks like at least clang decides that it can combine common code and turn those 'ustep' calls into indirect calls off a conditional register, ie code like movq $memcpy_from_iter, %rax movq $memcpy_from_iter_mc, %r13 cmoveq %rax, %r13 [...] movq %r13, %r11 callq __x86_indirect_thunk_r11 Which is absolutely horrible. It might actually generate smaller code, but with all the speculation overhead, indirect calls are a complete no-no. They now cause a pipeline flush on a large majority of CPUs out there. That code generation is not ok, and the old macro thing didn't generate it (because it didn't have any indirect calls). And it turns out that __always_inline on those functions doesn't even help, because the fact that it's called through an indirect function pointer means that at least clang just keeps it as an indirect call. So I think you need to remove the changes you did to memcpy_from_iter(). The old code was an explicit conditional of direct calls: if (iov_iter_is_copy_mc(i)) return (void *)copy_mc_to_kernel(to, from, size); return memcpy(to, from, size); and now you do that iov_iter_is_copy_mc(i) ? memcpy_from_iter_mc : memcpy_from_iter); to pass in a function pointer. Not ok. Not ok at all. It may look clever, but function pointers are bad. Avoid them like the plague. Linus