I recently worked on allowing BPF programs to work properly on CLANG_CFI enabled kernels [1]. While doing this I found that fentry programs are failing to attach because DYNAMIC_FTRACE_WITH_CALL_OPS doesn't work with CLANG_CFI. Mark told me that the problem is that clang CFI places the type hash immediately before any pre-function NOPs, and so where some functions have pre-function NOPs and others do not, the type hashes are not at a consistent offset (and effectively the functions have different ABIs and cannot call one another) I tried enabling both Clang CFI and -fpatchable-function-entry=4,2 to see the behaviour and where this could fail. Here is an example: This is the disassembly of jump_label_cmp() that has two pre-function nops and the CFI hash before them. So, the hash is at (addr - 12). ffff80008033e9b0: 16c516ce [kCFI hash for 'static int jump_label_cmp(const void *a, const void *b)'] ffff80008033e9b4: d503201f nop ffff80008033e9b8: d503201f nop ffff80008033e9bc <jump_label_cmp>: ffff80008033e9bc: d503245f bti c ffff80008033e9c0: d503201f nop ffff80008033e9c4: d503201f nop [.....] The following is the disassembly of the sort_r() function that makes an indirect call to jump_label_cmp() but loads the CFI hash from (addr - 4) rather than (addr - 12). So, it is loading the nop instruction and not the hash. ffff80008084e19c <sort_r>: [.....] 0xffff80008084e454 <+696>: ldur w16, [x8, #-4] (#-4 here should be #-12) 0xffff80008084e458 <+700>: movk w17, #0x16ce 0xffff80008084e45c <+704>: movk w17, #0x16c5, lsl #16 0xffff80008084e460 <+708>: cmp w16, w17 0xffff80008084e464 <+712>: b.eq 0xffff80008084e46c <sort_r+720> // b.none 0xffff80008084e468 <+716>: brk #0x8228 0xffff80008084e46c <+720>: blr x8 This would cause a cfi exception. As I haven't spent more time trying to understand this, I am not aware how the compiler emits 2 nops before some functions and none for others. I would propose the following changes to the compiler that could fix this issue: 1. The kCFI hash should always be generated at func_addr - 4, this would make the calling code consistent. 2. The two(n) nops should be generated before the kCFI hash. We would modify the ftrace code to look for these nops at (fun_addr - 12) and (func_addr - 8) when CFI is enabled, and (func_addr - 8), (func_addr - 4) when CFI is disabled. The generated code could then look like: ffff80008033e9b0: d503201f nop ffff80008033e9b4: d503201f nop ffff80008033e9b8: 16c516ce kCFI hash ffff80008033e9bc <jump_label_cmp>: ffff80008033e9bc: d503245f bti c ffff80008033e9c0: d503201f nop ffff80008033e9c4: d503201f nop [.....] Note: I am overlooking the alignment requirements here, we might need to add another nop above the hash to make sure the top two nops are aligned at 8 bytes. I am not sure how useful this solution is, looking forward to hear from others who know more about this topic. Thanks, Puranjay [1] https://lore.kernel.org/bpf/20240227151115.4623-1-puranjay12@xxxxxxxxx/