On Mon, Oct 14, 2024 at 10:13:58AM +0200, Peter Zijlstra wrote: > On Mon, Oct 14, 2024 at 02:47:01AM +0800, Kuan-Wei Chiu wrote: > > All current min heap API functions are marked with '__always_inline'. > > However, as the number of users increases, inlining these functions > > everywhere leads to a significant increase in kernel size. > > > > In performance-critical paths, such as when perf events are enabled and > > min heap functions are called on every context switch, it is important > > to retain the inline versions for optimal performance. To balance this, > > the original inline functions are kept, and additional non-inline > > versions of the functions have been added in lib/min_heap.c. > > The reason it is all __always_inline is because then the whole > min_heap_callbacks thing can be constant propagated and the func->less() > etc calls become direct calls. > > Doing out of line for this stuff, makes them indirect calls, and > indirect calls are super retarded expensive ever since spectre. But also > things like kCFI add significant cost to indirect calls. > > Something that would be a trivial subtract instruction becomes this > giant mess of an indirect function call. > Yes, I also learned from reading the lib/sort.c git log that indirect function calls can become especially expensive when CONFIG_MITIGATION_RETPOLINE is enabled. I'm not an expert in bcache, bcachefs, or dm-vdo, but when Andrew previously suggested moving these functions to lib/min_heap, Kent expressed his support. This led me to believe that in bcache and bcachefs (which are the primary users of min_heap), these indirect function calls are considered acceptable. However, that's just my assumption — I'll wait for Kent to chime in on this. > Given the whole min_heap thing is basically a ton of less() and swp() > calls, I really don't think this trade off makes any kind of sense. As for your point about min_heap being full of less() and swp() calls, we could handle swp() the same way it's done in lib/sort.c by providing a built-in swap function, which would help avoid indirect function calls in places other than fs/bcachefs/ec.c. However, for less(), it seems there might not be a way around it. Regards, Kuan-Wei