On Tue, Mar 4, 2025 at 3:47 PM Menglong Dong <menglong8.dong@xxxxxxxxx> wrote: > > On Tue, Mar 4, 2025 at 2:16 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > On Tue, Mar 04, 2025 at 06:38:53AM +0100, Peter Zijlstra wrote: > > > On Tue, Mar 04, 2025 at 09:10:12AM +0800, Menglong Dong wrote: > > > > Hello, sorry that I forgot to add something to the changelog. In fact, > > > > I don't add extra 5-bytes anymore, which you can see in the 3rd patch. > > > > > > > > The thing is that we can't add extra 5-bytes if CFI is enabled. Without > > > > CFI, we can make the padding space any value, such as 5-bytes, and > > > > the layout will be like this: > > > > > > > > __align: > > > > nop > > > > nop > > > > nop > > > > nop > > > > nop > > > > foo: -- __align +5 > > > > > > > > However, the CFI will always make the cfi insn 16-bytes aligned. When > > > > we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be > > > > like this: > > > > > > > > __cfi_foo: > > > > nop (11) > > > > mov $0x12345678, %reg > > > > nop (16) > > > > foo: > > > > > > > > and the padding space is 32-bytes actually. So, we can just select > > > > FUNCTION_ALIGNMENT_32B instead, which makes the padding > > > > space 32-bytes too, and have the following layout: > > > > > > > > __cfi_foo: > > > > mov $0x12345678, %reg > > > > nop (27) > > > > foo: > > > > > > *blink*, wtf is clang smoking. > > > > > > I mean, you're right, this is what it is doing, but that is somewhat > > > unexpected. Let me go look at clang source, this is insane. > > > > Bah, this is because assemblers are stupid :/ > > > > There is no way to tell them to have foo aligned such that there are at > > least N bytes free before it. > > > > So what kCFI ends up having to do is align the __cfi symbol to the > > function alignment, and then stuff enough nops in to make the real > > symbol meet alignment. > > > > And the end result is utter insanity. > > > > I mean, look at this: > > > > 50: 2e e9 00 00 00 00 cs jmp 56 <__traceiter_initcall_level+0x46> 52: R_X86_64_PLT32 __x86_return_thunk-0x4 > > 56: 66 2e 0f 1f 84 00 00 00 00 00 cs nopw 0x0(%rax,%rax,1) > > > > 0000000000000060 <__cfi___probestub_initcall_level>: > > 60: 90 nop > > 61: 90 nop > > 62: 90 nop > > 63: 90 nop > > 64: 90 nop > > 65: 90 nop > > 66: 90 nop > > 67: 90 nop > > 68: 90 nop > > 69: 90 nop > > 6a: 90 nop > > 6b: b8 b1 fd 66 f9 mov $0xf966fdb1,%eax > > > > 0000000000000070 <__probestub_initcall_level>: > > 70: 2e e9 00 00 00 00 cs jmp 76 <__probestub_initcall_level+0x6> 72: R_X86_64_PLT32 __x86_return_thunk-0x4 > > > > > > That's 21 bytes wasted, for no reason other than that asm doesn't have a > > directive to say: get me a place that is M before N alignment. > > > > Because ideally the whole above thing would look like: > > > > 50: 2e e9 00 00 00 00 cs jmp 56 <__traceiter_initcall_level+0x46> 52: R_X86_64_PLT32 __x86_return_thunk-0x4 > > 56: 66 2e 0f 1f 84 cs nopw (%rax,%rax,1) > > > > 000000000000005b <__cfi___probestub_initcall_level>: > > 5b: b8 b1 fd 66 f9 mov $0xf966fdb1,%eax > > > > 0000000000000060 <__probestub_initcall_level>: > > 60: 2e e9 00 00 00 00 cs jmp 76 <__probestub_initcall_level+0x6> 72: R_X86_64_PLT32 __x86_return_thunk-0x4 > > Hi, peter. Thank you for the testing, which is quite helpful > to understand the whole thing. > > I was surprised at this too. Without CALL_PADDING, the cfi is > nop(11) + mov; with CALL_PADDING, the cfi is mov + nop(11), > which is weird, as it seems that we can select CALL_PADDING if > CFI_CLANG to make things consistent. And I thought that it is > designed to be this for some reasons :/ > > Hmm......so what should we do now? Accept and bear it, > or do something different? > > I'm good at clang, so the solution that I can think of is how to *not good at* > bear it :/ > > According to my testing, the text size will increase: > > ~2.2% if we make FUNCTION_PADDING_BYTES 27 and select > FUNCTION_ALIGNMENT_16B. > > ~3.5% if we make FUNCTION_PADDING_BYTES 27 and select > FUNCTION_ALIGNMENT_32B. > > We don't have to select FUNCTION_ALIGNMENT_32B, so the > worst case is to increase ~2.2%. > > What do you think? > > Thanks! > Menglong Dong > > > > > > >