The kallmodsyms patch series was originally posted in Nov 2019, and the thread (https://lore.kernel.org/linux-kbuild/20191114223036.9359-1-eugene.loh@xxxxxxxxxx/t/#u) shows review comments, questions, and feedback from interested parties. All review comments have been satisfied, as far as I know: in particular Yamada's note about translation units that are shared between built-in modules is satisfied with a better representation which is also much, much smaller. A kernel tree containing this series alone: https://github.com/oracle/dtrace-linux-kernel kallmodsyms/6.1-rc2 The whole point of symbols is that their names are unique: you can look up a symbol and get back a unique address, and vice versa. Alas, because /proc/kallsyms (rightly) reports all symbols, even hidden ones, it does not really satisfy this requirement. Large numbers of symbols are duplicated many times (just search for __list_del_entry!), and while usually these are just out-of-lined things defined in header files and thus all have the same implementation, it does make it needlessly hard to figure out which one is which in stack dumps, when tracing, and such things. Right now the kernel has no way at all to tell these apart, and nor has the user: their address differs and that's all. Which module did they come from? Which object file? We don't know. Figuring out which is which when tracing needs a combination of guesswork and luck. In discussions at LPC it became clear that this is not just annoying me but Steve Rostedt and others, so it's probably desirable to fix this. It turns out that the linker, and the kernel build system, can be made to give us everything we need to resolve this once and for all. This series provides a new /proc/kallmodsyms which is like /proc/kallsyms except that it annotates every (textual) symbol which comes from a built-in kernel module with the module's name, in square brackets: if a symbol is used by multiple modules, it gets [multiple] [names]. (We also add corresponding new fields in the kallsyms iterator.) But that's not quite enough: some symbols are still ambiguous, particularly those that appear in the non-modular parts of the core kernel but also some things that appear in built-in modules. We annotate such symbols with cut-down {object file} names: the combination of symbol, [module] [names] and {object file name} is unique. (The object file names are cut down to save space: we store only the shortest suffix needed to distinguish symbols from each other. It's fairly rare even to see two/level names, let alone three/level/ones. We also save even more space by annotating every symbol in a given object file with the object file name if we annotate any of them.) In brief we do this by mapping from address ranges to object files (with assistance from the linker map file), then mapping from those object files to built-in kernel modules and object file names. Because the number of object files is much smaller than the number of symbols, because we fuse address range and object file entries together if possible, and becasue we don't even store object file names unless we need to, this is a fairly efficient representation, even with a bit of extra complexity to allow object files to be in more than one module at once. The size impact of all of this is minimal: in testing, vmlinux grew by 16632 bytes, and the compressed vmlinux only grew by 12544 bytes (about .1% of a 10MiB kernel): though this is very configuration-dependent, it seems likely to scale roughly with the kernel as a whole. This is all controlled by a new config parameter CONFIG_KALLMODSYMS, which when set results in output in /proc/kallmodsyms that looks like this: ffffffff97606e50 t not_visible ffffffff97606e70 T perf_msr_probe ffffffff97606f80 t test_msr [rapl] ffffffffa6007350 t rapl_pmu_event_stop [rapl] ffffffffa6007440 t rapl_pmu_event_del [rapl] ffffffffa6007460 t rapl_hrtimer_handle [rapl] ffffffffa6007500 t rapl_pmu_event_read [rapl] ffffffffa6007520 t rapl_pmu_event_init [rapl] ffffffffa6007630 t rapl_cpu_offline [rapl] ffffffffa6007710 t amd_pmu_event_map {core.o} ffffffffa6007750 t amd_pmu_add_event {core.o} ffffffffa6007760 t amd_put_event_constraints_f17h {core.o} The modular symbols are notated as [rapl] even if rapl is built into the kernel. Further, at least one symbol nottated as {core.o} would have been ambiguous without that notation. If we look a little further down, we see: ffffffff97607a70 t cmask_show {core.o} ffffffff97607ab0 t inv_show {core.o} ffffffff97607ae0 t edge_show {core.o} ffffffff97607b10 t umask_show {core.o} ffffffff97607b40 t event_show {core.o} where event_show in particular is highly ambiguous and appears in many object files, all of which are now notated with different {object file names}. Further down, we see what happens when object files are reused by multiple modules, all of which are built in to the kernel, and some of which contain symbols that are ambiguously-named even within that set of modules: ffffffff97d7aed0 t liquidio_pcie_mmio_enabled [liquidio] ffffffff97d7aef0 t liquidio_pcie_resume [liquidio] ffffffff97d7af00 t liquidio_ptp_adjtime [liquidio] ffffffff97d7af50 t liquidio_ptp_enable [liquidio] ffffffff97d7af70 t liquidio_get_stats64 [liquidio] ffffffff97d7b0f0 t liquidio_fix_features [liquidio] ffffffff97d7b1c0 t liquidio_get_port_parent_id [liquidio] [...] ffffffff97d824c0 t lio_vf_rep_modinit [liquidio] ffffffff97d824f0 t lio_vf_rep_modexit [liquidio] ffffffff97d82520 t lio_ethtool_get_channels [liquidio] [liquidio_vf] ffffffff97d82600 t lio_ethtool_get_ringparam [liquidio] [liquidio_vf] ffffffff97d826a0 t lio_get_msglevel [liquidio] [liquidio_vf] ffffffff97d826c0 t lio_vf_set_msglevel [liquidio] [liquidio_vf] ffffffff97d826e0 t lio_get_pauseparam [liquidio] [liquidio_vf] ffffffff97d82710 t lio_get_ethtool_stats [liquidio] [liquidio_vf] ffffffff97d82e70 t lio_vf_get_ethtool_stats [liquidio] [liquidio_vf] [...] ffffffff97d91a80 t cn23xx_vf_mbox_thread [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91aa0 t cpumask_weight.constprop.0 [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91ac0 t cn23xx_vf_msix_interrupt_handler [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91bd0 t cn23xx_vf_get_oq_ticks [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91c00 t cn23xx_vf_ask_pf_to_do_flr [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91c70 t cn23xx_octeon_pfvf_handshake [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d91e20 t cn23xx_setup_octeon_vf_device [liquidio] [liquidio_vf] {cn23xx_vf_device.o} ffffffff97d92060 t octeon_mbox_read [liquidio] [liquidio_vf] ffffffff97d92230 t octeon_mbox_write [liquidio] [liquidio_vf] [...] ffffffff97d946b0 t octeon_alloc_soft_command_resp [liquidio] [liquidio_vf] ffffffff97d947e0 t octnet_send_nic_data_pkt [liquidio] [liquidio_vf] ffffffff97d94820 t octnet_send_nic_ctrl_pkt [liquidio] [liquidio_vf] ffffffff97d94ab0 t liquidio_get_stats64 [liquidio_vf] ffffffff97d94c10 t liquidio_fix_features [liquidio_vf] ffffffff97d94cd0 t wait_for_pending_requests [liquidio_vf] Like /proc/kallsyms, the output is sorted by address, so keeps the curious property of /proc/kallsyms that symbols may appear repeatedly with different addresses: but now, unlike in /proc/kallsyms, we can see that those symbols appear repeatedly because they are *different symbols* that ultimately belong to different modules or different object files, all of which are built in to the kernel. Note that kernel symbols for built-in modules will probably appear interspersed with other symbols that are part of different modules and non-modular always-built-in symbols, which, as usual, have no square-bracketed module denotation (though they might have an {object file name}. As with /proc/kallsyms, non-root usage produces addresses that are all zero. (Now that kallmodsyms data uses very little space, the new CONFIG_KALLMODSYMS option might perhaps be something people don't want to bother with: maybe we can just control it via CONFIG_KALLSYMS or something?) Limitations: - this approach only works for textual symbols (and weak ones). I don't see any way to make it work for data symbols etc: except for initialized data they don't really have corresponding object files at all and they tend to get merged together anyway. - Non-built-in modules can also have ambiguous symbols in them in different input object files: they aren't handled yet because kallsyms never runs over modules to create the necessary sections. This is fixable, but it's probably best handled in another patch series. (kallsyms would need to do much less work for modules: only the sections introduced by this patch series would need emission at all, and no [module] notations would be needed, only {objfile}.) - Section start/end symbols necessarily lie on the boundary between object files, so are sometimes misreported as being in the wrong object file or module. This is unlikely to be too troublesome for these symbols in particular, but if anyone can figure out a way to fix this I'd be happy to do it. - There is no BPF iterator support yet (it's just a matter of adding it if needed). The commits in this series all have reviewed-by tags: they're all from internal reviews, so please ignore them. Differences from v8, February 2022: - Add object file name handling, emitting only those object names needed to disambiguate symbols, shortening them as much as possible compatible with that. - Rename .kallsyms_module_names to .kallsyms_mod_objnames now that it contains object file names too. - Fix a bug in optimize_obj2mod that prevented proper reuse of module names for object files appearing in both multimodule modules and single-module modules: saves a few KiB more, often more than the space increase due to object file name handling. - Rebased atop v6.1-rc2: move modules_thick.builtin generation into the top-level Kbuild accordingly, and adjust to getopt_long use in scripts/kallsyms. - Significant revisions to the cover letter. - Add proof-of-concept kallmodsyms module support to perf. Differences from v7, December 2021: - Adjust for changes in the v5.17 merge window. Adjust a few commit messages and shrink the cover letter. - Drop the symbol-size patch, probably better done from userspace. Differences from v6, November 2021: - Adjust for rewrite of confdata machinery in v5.16 (tristate.conf handling is now more of a rewrite than a reversion) Differences from v5, October 2021: - Fix generation of mapfiles under UML Differences from v4, September 2021: - Fix building of tristate.conf if missing (usually concealed by the syncconfig being run for other reasons, but not always: the kernel test robot spotted it). - Forward-port atop v5.15-rc3. Differences from v3, August 2021: - Fix a kernel test robot warning in get_ksymbol_core (possible use of uninitialized variable if kallmodsyms was wanted but kallsyms_module_offsets was not present, which is most unlikely). Differences from v2, June 2021: - Split the series up. In particular, the size impact of the table optimizer is now quantified, and the symbol-size patch is split out and turned into an RFC patch, with the /proc/kallmodsyms format before that patch lacking a size column. Some speculation on how to make the symbol sizes less space-wasteful is added (but not yet implemented). - Drop a couple of unnecessary #includes, one unnecessarily exported symbol, and a needless de-staticing. Differences from v1, in 2019: - Move from a straight symbol->module name mapping to a mapping from address-range to TU to module name list, bringing major space savings over the previous approach and support for object files used by many built-in modules at the same time, at the cost of a slightly more complex approach (unavoidably so, I think, given that we have to merge three data sources together: the link map in .tmp_vmlinux.ranges, the nm output on stdin, and the mapping from TU name to module names in modules_thick.builtin). We do opportunistic merging of TUs if they cite the same modules and reuse module names where doing so is simple: see optimize_obj2mod below. I considered more extensive searches for mergeable entries and more intricate encodings of the module name list allowing TUs that are used by overlapping sets of modules to share their names, but such modules are rare enough (and such overlapping sharings are vanishingly rare) that it seemed likely to save only a few bytes at the cost of much more hard-to-test code. This is doubly true now that the tables needed are only a few kilobytes in length. Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx> Signed-off-by: Eugene Loh <eugene.loh@xxxxxxxxxx> Reviewed-by: Kris Van Hees <kris.van.hees@xxxxxxxxxx> Nick Alcock (8): kbuild: bring back tristate.conf kbuild: add modules_thick.builtin kbuild: generate an address ranges map at vmlinux link time kallsyms: introduce sections needed to map symbols to built-in modules kallsyms: optimize .kallsyms_modules* kallsyms: distinguish text symbols fully using object file names kallsyms: add /proc/kallmodsyms for text symbol disambiguation perf: proof-of-concept kallmodsyms support .gitignore | 1 + Documentation/dontdiff | 1 + Documentation/kbuild/kconfig.rst | 5 + Kbuild | 22 + Makefile | 9 +- init/Kconfig | 9 + kernel/kallsyms.c | 277 ++++++- kernel/kallsyms_internal.h | 14 + scripts/Kbuild.include | 6 + scripts/Makefile | 6 + scripts/Makefile.modbuiltin | 56 ++ scripts/kallsyms.c | 1187 +++++++++++++++++++++++++++++- scripts/kconfig/confdata.c | 41 +- scripts/link-vmlinux.sh | 15 +- scripts/modules_thick.c | 200 +++++ scripts/modules_thick.h | 48 ++ tools/perf/builtin-kallsyms.c | 35 +- tools/perf/util/event.c | 14 +- tools/perf/util/machine.c | 6 +- tools/perf/util/machine.h | 1 + tools/perf/util/symbol.c | 207 ++++-- tools/perf/util/symbol.h | 12 +- 22 files changed, 2073 insertions(+), 99 deletions(-) create mode 100644 scripts/Makefile.modbuiltin create mode 100644 scripts/modules_thick.c create mode 100644 scripts/modules_thick.h -- 2.38.0.266.g481848f278