On Wed, Mar 5, 2025 at 6:59 PM Menglong Dong <menglong8.dong@xxxxxxxxx> wrote: > > I'm not sure if it works. However, indirect call is also used > in function graph, so we still have better performance. Isn't it? > > Let me have a look at the code of the function graph first :/ Menglong, Function graph infra isn't going to help. "call foo" isn't a problem either. But we have to step back. per-function metadata is an optimization and feels like we're doing a premature optimization here without collecting performance numbers first. Let's implement multi-fentry with generic get_metadata_by_ip() first. get_metadata_by_ip() will be a hashtable in such a case and then we can compare its performance when it's implemented as a direct lookup from ip-4 (this patch) vs hash table (that does 'ip' to 'metadata' lookup). If/when we decide to do this per-function metadata we can also punt to generic hashtable for cfi, IBT, FineIBT, etc configs. When mitigations are enabled the performance suffers anyway, so hashtable lookup vs direct ip-4 lookup won't make much difference. So we can enable per-function metadata only on non-mitigation configs when FUNCTION_ALIGNMENT=16. There will be some number of bytes available before every function and if we can tell gcc/llvm to leave at least 5 bytes there the growth of vmlinux .text will be within a noise. So let's figure out the design of multi-fenty first with a hashtable for metadata and decide next steps afterwards.