On Fri, 27 Dec 2024 10:09:40 -0800 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 26 Dec 2024 at 19:45, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > > > > > Btw, does this actually happen when the compiler does the mcount thing for us? > > > > Yes. > > Ok, that's actually good. > > I'm not really worried about the "unused symbols aren't in kallsyms" > issue, even if it confuses the mcount logic. THAT confusion is easy to > deal with by either adding symbol size information (which I think > would be a good thing in general, although perhaps not worth it). > > Even without the symbol size, the mcount issue can be dealt with by > just knowing that the mcount location has to be at the very beginning > of the function and just taking the offset - that we already do have - > into account. This is actually what we do today. A check is made against the location of the fentry/mcount, and if it's too far away from the function entry, it is considered a weak function. The distance is architecture dependent, as well as options (adding ENDBR and such needs to be accounted for). Now we still need to list them in the available_filter_functions, as the order of what's in mcount_loc is used by the set_filter_function, and if we "hide" any, it can screw up the accounting, because you can enable functions via the index into that file. Using an index is an O(1) operation, where as by name it needs to search all addresses and call kallsyms for a name look up and then compare to what was passed in. Tooling uses the index, as it's much faster to enable several functions (which by name turns into an O(n^2) operation). Enabling a thousand functions by name can take over a minute to complete, whereas by index takes less than a second. For now, we have in available_filter_functions: trace_initcall_finish_cb initcall_blacklisted do_one_initcall __ftrace_invalid_address___708 rootfs_init_fs_context wait_for_initramfs __ftrace_invalid_address___68 calibration_delay_done calibrate_delay Where those "__ftrace_invalid_address_*" is detected as a weak function. This is ugly and a hack, but we still need a place holder for them :-( I would love to find a better way to handle this. Now, the .mcount_loc is sorted at build time. If there's a way to find which addresses are no longer visible, then that code could possibly remove them from the mcount_loc. Perhaps, it could do a 'nm -S' and check to make sure that all addresses are within a listed size, and if not, remove it. That way, we could get rid of those place holders. > > I was more worried that there might also be some much deeper confusion > with the linker actually garbage collecting the unused weak function > code away, and now an unused symbol that kallsyms doesn't know about > wouldn't just have an unexpected mcount pointer to it, but the mcount > pointer would actually be stale and point to some unrelated code. > > So as long as *that* isn't what is happening, this all seems fairly benign. For now, the mcount_loc is just an annoyance as we have those ugly place holders. The real bug is with live kernel patching, where there's a mismatch in the function size. IIUC, the building of the module for the live patching finds the size of the function it's patching with 'nm'. And then before patching, asks kallsyms for the size of the function. Because kallsyms uses the next function to determine the size, it returns the size of the function plus the size of the weak function. There's a mismatch and live kernel patching refuses to patch the function. By adding the actual size of the function to kallsyms, that would fix the problem. The size doesn't need to be exported to /proc/kallsyms, as you mentioned before, all users happen to be in-kernel. The one downsize of storing the numbers, is that we will need to store a size for every function, and that can add up. $ wc -l /proc/kallsyms 208784 /proc/kallsyms That's 208 thousand numbers to store! Perhaps this is the one advantage of having a weak placeholder in kallsyms as well. It may save memory. -- Steve