Hello, Il giorno dom 31 dic 2023 alle ore 06:35 Masahiro Yamada <masahiroy@xxxxxxxxxx> ha scritto: > > On Tue, Dec 5, 2023 at 6:48 AM Alessandro Carminati (Red Hat) > <alessandro.carminati@xxxxxxxxx> wrote: > > > > In the kernel environment, situations frequently arise where identical > > names are shared among symbols within both the core image and modules. > > While this doesn't pose issues for the kernel's binary itself, it > > complicates trace or probe operations using tools like kprobe. > > > > This patch introduces "kas_alias" to address this challenge. > > > > During the kernel's build process, just before linking the vmlinux > > image in the "scripts/link-vmlinux.sh", symbol name frequencies are > > collected. > > This collection includes both the core kernel components and modules. > > Subsequently, within the same action, the nm data relative to vmlinux > > is modified by adding aliases based on the comprehensive symbol > > information gathered. > > > > The collection process occurs in two phases: > > > > 1. First phase: Executed during the linking of vmlinux, "kas_alias" scans > > all symbols provided by the 'nm' data against the vmlinux core image > > and all objects used for module linkage. This phase requires all > > modules objects to be produced at this stage, thereby adding a vmlinux > > dependency for linking modules in 'scripts/Makefile.modfinal'. > > > > 2. Second phase: In a subsequent run in the same build, "kas_alias" > > processes module objects and injects aliases into the objects' symbol > > tables where necessary. This operation is done by modifying > > 'scripts/Makefile.modfinal' to include an action for each processed > > module. > > > > Example: > > > > Consider the symbol "device_show", you can expect an output like the > > following: > > > > ~ # cat /proc/kallsyms | grep " name_show" > > ffffcaa2bb4f01c8 t name_show > > ffffcaa2bb4f01c8 t name_show@kernel_irq_irqdesc_c_264 > > ffffcaa2bb9c1a30 t name_show > > ffffcaa2bb9c1a30 t name_show@drivers_pnp_card_c_186 > > ffffcaa2bbac4754 t name_show > > ffffcaa2bbac4754 t name_show@drivers_regulator_core_c_678 > > ffffcaa2bbba4900 t name_show > > ffffcaa2bbba4900 t name_show@drivers_base_power_wakeup_stats_c_93 > > ffffcaa2bbec4038 t name_show > > ffffcaa2bbec4038 t name_show@drivers_rtc_sysfs_c_26 > > ffffcaa2bbecc920 t name_show > > ffffcaa2bbecc920 t name_show@drivers_i2c_i2c_core_base_c_660 > > ffffcaa2bbed3840 t name_show > > ffffcaa2bbed3840 t name_show@drivers_i2c_i2c_dev_c_100 > > ffffcaa2bbef7210 t name_show > > ffffcaa2bbef7210 t name_show@drivers_pps_sysfs_c_66 > > ffffcaa2bbf03328 t name_show > > ffffcaa2bbf03328 t name_show@drivers_hwmon_hwmon_c_72 > > ffffcaa2bbff6f3c t name_show > > ffffcaa2bbff6f3c t name_show@drivers_remoteproc_remoteproc_sysfs_c_215 > > ffffcaa2bbff8d78 t name_show > > ffffcaa2bbff8d78 t name_show@drivers_rpmsg_rpmsg_core_c_455 > > ffffcaa2bbfff7a4 t name_show > > ffffcaa2bbfff7a4 t name_show@drivers_devfreq_devfreq_c_1395 > > ffffcaa2bc001f60 t name_show > > ffffcaa2bc001f60 t name_show@drivers_extcon_extcon_c_389 > > ffffcaa2bc009890 t name_show > > ffffcaa2bc009890 t name_show@drivers_iio_industrialio_core_c_1396 > > ffffcaa2bc01212c t name_show > > ffffcaa2bc01212c t name_show@drivers_iio_industrialio_trigger_c_51 > > ffffcaa2bc025e2c t name_show > > ffffcaa2bc025e2c t name_show@drivers_fpga_fpga_mgr_c_618 > > ffffcaa2a052102c t name_show [hello] > > ffffcaa2a052102c t name_show@hello_hello_c_8 [hello] > > ffffcaa2a051955c t name_show [rpmsg_char] > > ffffcaa2a051955c t name_show@drivers_rpmsg_rpmsg_char_c_365 [rpmsg_char] > > > > where hello, is a plain helloworld module built OOT. > > > > Tested-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> > > Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@xxxxxxxxx> > > > > --- > > > > NOTE1: > > About the symbols name duplication that happens as consequence of the > > inclusion compat_binfmt_elf.c does, it is evident that this corner is > > inherently challenging the addr2line approach. > > Attempting to conceal this limitation would be counterproductive. > > > > compat_binfmt_elf.c includes directly binfmt_elf.c, addr2line can't help > > but report all functions and data declared by that file, coming from > > binfmt_elf.c. > > > > My position is that, rather than producing a more complicated pipeline > > to handle this corner case, it is better to fix the compat_binfmt_elf.c > > anomaly. > > > > This patch does not deal with the two potentially problematic symbols > > defined by compat_binfmt_elf.c > > > > Changes from v1: > > * Integrated changes requested by Masami to exclude symbols with prefixes > > "_cfi" and "_pfx". > > * Introduced a small framework to handle patterns that need to be excluded > > from the alias production. > > * Excluded other symbols using the framework. > > * Introduced the ability to discriminate between text and data symbols. > > * Added two new config symbols in this version: > > CONFIG_KALLSYMS_ALIAS_DATA, which allows data for data, and > > CONFIG_KALLSYMS_ALIAS_DATA_ALL, which excludes all filters and provides > > an alias for each duplicated symbol. > > > > https://lore.kernel.org/all/20230711151925.1092080-1-alessandro.carminati@xxxxxxxxx/ > > > > Changes from v2: > > * Alias tags are created by querying DWARF information from the vmlinux. > > * The filename + line number is normalized and appended to the original > > name. > > * The tag begins with '@' to indicate the symbol source. > > * Not a change, but worth mentioning, since the alias is added to the > > existing list, the old duplicated name is preserved, and the livepatch > > way of dealing with duplicates is maintained. > > * Acknowledging the existence of scenarios where inlined functions > > declared in header files may result in multiple copies due to compiler > > behavior, though it is not actionable as it does not pose an operational > > issue. > > * Highlighting a single exception where the same name refers to different > > functions: the case of "compat_binfmt_elf.c," which directly includes > > "binfmt_elf.c" producing identical function copies in two separate > > modules. > > > > https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@xxxxxxxxx/ > > > > Changes from v3: > > * kas_alias was rewritten in Python to create a more concise and > > maintainable codebase. > > * The previous automation process used by kas_alias to locate the vmlinux > > and the addr2line has been replaced with an explicit command-line switch > > for specifying these requirements. > > * addr2line has been added into the main Makefile. > > * A new command-line switch has been introduced, enabling users to extend > > the alias to global data names. > > > > https://lore.kernel.org/all/20230828080423.3539686-1-alessandro.carminati@xxxxxxxxx/ > > > > Changes from v4: > > * Fixed the O=<build dir> build issue > > * The tool halts execution upon encountering major issues, thereby ensuring > > the pipeline is interrupted. > > * A cmdline option to specify the source directory added. > > * Minor code adjusments. > > * Tested on mips32 and i386 > > > > https://lore.kernel.org/all/20230919193948.465340-1-alessandro.carminati@xxxxxxxxx/ > > > > Changes from v5: > > * Regex filter extended to all symbols > > * Alias creation extended to module objects > > * Code cleaned and commented > > * kas_alias verbose execution via KAS_ALIAS_DEBUG env variable > > * CONFIG_KALLSYMS_ALIAS_SRCLINE selects KBUILD_BUILTIN to ensure no races > > during modules build > > * Tested on x86_64, aarch64 and i386 > > > > https://lore.kernel.org/all/20230927173516.1456594-1-alessandro.carminati@xxxxxxxxx/ > > > > Changes from v6: > > * Generate a file to facilitate the construction of custom OOT modules using aliases. > > In this context, it is needed to export the modules.symbfreq file, a product of > > the build. > > * Previously, kas_alias was intended to run once, executing all tasks within the same > > session. In this version, two targets have been introduced to split the work into > > distinct phases: one for creating aliases for the core image and another for > > generating aliases for modules. > > * The module aliases production statement has been inserted in the > > scripts/Makefile.modfinal. Instead of being executed individually for each module, > > it now runs once for all modules. > > * The approach for excluding symbols in .init and .exit ELF sections has undergone a > > rewrite. > > * kas_alias is now capable of applying aliases to OOT modules. However, for this to > > occur, the 'modules.symbfreq' file needs to be accessible during the build. > > If it is not found, aliases won't be added. > > > > https://lore.kernel.org/all/20231024201157.748254-1-alessandro.carminati@xxxxxxxxx/ > > > > > > I am concerned with your code. > At least, I'd like you to understand the Makefile principle. > > > > > > <output>: <input> > <command> > > > > The <command> reads <input> and generates <output>. > > The <command> must not modify <input>. > > > If you deviate from this principle, > we will never know where build artifacts are updated. > > > > > Your code looks like this: > > > %.ko: %.o %.mod.o > Your script inserts aliases to %.o > Link %.o and %.mod.o into %.ko > > > > Your script updates the input file, %.o. > This is never acceptable. > > > I truly appreciate your guidance regarding the Makefile principle. I realize now that my self-taught approach overlooked key aspects, particularly the importance of not modifying the input file, and I'm committed to aligning my code with this principle. Before I proceed, I seek a better understanding of how the kernel community aims to address the duplicate symbols issue. > Anyway, @FILENAME_LINE seems a cheesy workaround after all, > and I do not see a good reason to grow the ugliness. Regarding the FILENAME_LINE implementation, my assumption stemmed from notable comments by Luis Chamberlain https://lore.kernel.org/all/ZLVxUQiC5iF+xTPQ@xxxxxxxxxxxxxxxxxxxxxx/ Petr Mladek https://lore.kernel.org/all/ZQFtdJEKJ9taYpA0@alley/ within the threads related to my work, but the idea itself originated elsewhere: the erlies message I'm aware of, carrying this contents is a Petr's message as answer to another thread. https://lore.kernel.org/all/ZEeO_rWL4bDh5vui@alley/ If FILENAME_LINE doesn't serve the community's purpose, and my interpretation was mistaken, I apologize for any inconvenience caused and will cease work on this aspect. > > > > > > > --- > > Makefile | 14 +- > > init/Kconfig | 22 ++ > > scripts/Makefile.modfinal | 30 +- > > scripts/kas_alias.py | 600 ++++++++++++++++++++++++++++++++++++++ > > scripts/link-vmlinux.sh | 27 +- > > 5 files changed, 688 insertions(+), 5 deletions(-) > > create mode 100755 scripts/kas_alias.py > > > > diff --git a/Makefile b/Makefile > > index 99db546fbb45..810a0cd21d20 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -479,6 +479,7 @@ OBJCOPY = $(LLVM_PREFIX)llvm-objcopy$(LLVM_SUFFIX) > > OBJDUMP = $(LLVM_PREFIX)llvm-objdump$(LLVM_SUFFIX) > > READELF = $(LLVM_PREFIX)llvm-readelf$(LLVM_SUFFIX) > > STRIP = $(LLVM_PREFIX)llvm-strip$(LLVM_SUFFIX) > > +ADDR2LINE = $(LLVM_PREFIX)llvm-addr2line$(LLVM_SUFFIX) > > else > > CC = $(CROSS_COMPILE)gcc > > LD = $(CROSS_COMPILE)ld > > @@ -488,6 +489,7 @@ OBJCOPY = $(CROSS_COMPILE)objcopy > > OBJDUMP = $(CROSS_COMPILE)objdump > > READELF = $(CROSS_COMPILE)readelf > > STRIP = $(CROSS_COMPILE)strip > > +ADDR2LINE = $(CROSS_COMPILE)addr2line > > endif > > RUSTC = rustc > > RUSTDOC = rustdoc > > @@ -591,7 +593,7 @@ export RUSTC_BOOTSTRAP := 1 > > export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC HOSTPKG_CONFIG > > export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN CARGO > > export HOSTRUSTC KBUILD_HOSTRUSTFLAGS > > -export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL > > +export CPP AR NM STRIP OBJCOPY OBJDUMP READELF ADDR2LINE PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL > > export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX > > export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD > > export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE > > @@ -1453,6 +1455,16 @@ KBUILD_BUILTIN := 1 > > modules: vmlinux > > endif > > > > +# *.ko are usually independent of vmlinux, but CONFIG_KALLSYMS_ALIAS_SRCLINE > > +# is another exception. > > +# At the time when vmlinux is being linked, kas_alias operates on both vmlinux > > +# and modules. To prevent races with modules, kas_alias needs to pause operations > > +# while it is executed. > > +ifdef CONFIG_KALLSYMS_ALIAS_SRCLINE > > +KBUILD_BUILTIN := 1 > > +modules: vmlinux > > +endif > > + > > modules: modules_prepare > > > > # Target to prepare building external modules > > diff --git a/init/Kconfig b/init/Kconfig > > index 9ffb103fc927..f68341b29576 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1738,6 +1738,28 @@ config KALLSYMS_BASE_RELATIVE > > time constants, and no relocation pass is required at runtime to fix > > up the entries based on the runtime load address of the kernel. > > > > +config KALLSYMS_ALIAS_SRCLINE > > + bool "Produces alias for duplicated text symbols" if EXPERT > > + depends on KALLSYMS && DEBUG_INFO && !DEBUG_INFO_SPLIT > > + help > > + It is not uncommon for drivers or modules related to similar > > + peripherals to have symbols with the exact same name. > > + While this is not a problem for the kernel's binary itself, it > > + becomes an issue when attempting to trace or probe specific > > + functions using infrastructure like ftrace or kprobe. > > + > > + This option addresses this challenge, producing alias for text > > + symbol names that include the file name and line where the symbols > > + are defined in the source code. > > + > > +config KALLSYMS_ALIAS_SRCLINE_DATA > > + bool "Produces alias also for global variables names" > > + depends on KALLSYMS_ALIAS_SRCLINE > > + help > > + Sometimes it can be useful to refer to global vars by name. Since > > + they suffer the same issue as text symbols, this config option > > + allows having aliases for global variables names too. > > + > > # end of the "standard kernel features (expert users)" menu > > > > # syscall, maps, verifier > > diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal > > index 8568d256d6fb..acde29a46c6b 100644 > > --- a/scripts/Makefile.modfinal > > +++ b/scripts/Makefile.modfinal > > @@ -15,6 +15,32 @@ include $(srctree)/scripts/Makefile.lib > > # find all modules listed in modules.order > > modules := $(call read-file, $(MODORDER)) > > > > +ifeq ($(KBUILD_BUILTIN),y) > > + ifeq ($(or $(CONFIG_DEBUG_INFO_BTF_MODULES),$(CONFIG_KALLSYMS_ALIAS_SRCLINE)),y) > > + VMLINUX_DEP := vmlinux > > + endif > > +endif > > + > > +VMLINUX_DEP ?= > > > This line seems meaningless to me. > > > > > > + > > +ifeq ($(CONFIG_KALLSYMS_ALIAS_SRCLINE),y) > > + ifeq ($(CONFIG_KALLSYMS_ALIAS_SRCLINE_DATA),y) > > + KAS_DATA := --process_data > > + endif > > + > > + ifdef KAS_ALIAS_DEBUG > > + DEBUG_FLAG := $(KAS_ALIAS_DEBUG) > > + else > > + DEBUG_FLAG := 0 > > + endif > > + > > + KAS_ALIAS_MODULE_CMD := $(srctree)/scripts/kas_alias.py --symbol_frequency $(extmod_prefix)modules.symbfreq \ > > + --debug $(DEBUG_FLAG) --nm $(NM) $(KAS_DATA) \ > > + --addr2line $(ADDR2LINE) --basedir $(srctree) --separator @ \ > > + single_module \ > > + --objcopy $(OBJCOPY) --objdump $(OBJDUMP) --target-module > > +endif > > + > > __modfinal: $(modules:%.o=%.ko) > > @: > > > > @@ -30,6 +56,7 @@ quiet_cmd_cc_o_c = CC [M] $@ > > > > quiet_cmd_ld_ko_o = LD [M] $@ > > cmd_ld_ko_o += \ > > + [ "$(KAS_ALIAS_MODULE_CMD)" != "" ] && $(KAS_ALIAS_MODULE_CMD) $<; \ > > $(LD) -r $(KBUILD_LDFLAGS) \ > > $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ > > -T scripts/module.lds -o $@ $(filter %.o, $^) > > @@ -52,7 +79,7 @@ if_changed_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check), \ > > printf '%s\n' 'savedcmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:) > > > > # Re-generate module BTFs if either module's .ko or vmlinux changed > > -%.ko: %.o %.mod.o scripts/module.lds $(and $(CONFIG_DEBUG_INFO_BTF_MODULES),$(KBUILD_BUILTIN),vmlinux) FORCE > > +%.ko: %.o %.mod.o scripts/module.lds $(VMLINUX_DEP) FORCE > > +$(call if_changed_except,ld_ko_o,vmlinux) > > ifdef CONFIG_DEBUG_INFO_BTF_MODULES > > +$(if $(newer-prereqs),$(call cmd,btf_ko)) > > @@ -65,7 +92,6 @@ targets += $(modules:%.o=%.ko) $(modules:%.o=%.mod.o) > > > > PHONY += FORCE > > FORCE: > > - > > # Read all saved command lines and dependencies for the $(targets) we > > # may be building above, using $(if_changed{,_dep}). As an > > # optimization, we don't need to read them if the target does not > > diff --git a/scripts/kas_alias.py b/scripts/kas_alias.py > > new file mode 100755 > > index 000000000000..5b0517771389 > > --- /dev/null > > +++ b/scripts/kas_alias.py > > @@ -0,0 +1,600 @@ > > +#!/usr/bin/env python3 > > +# SPDX-License-Identifier: GPL-2.0-only > > +# > > +# Copyright (C) 2023 Red Hat, Inc. Alessandro Carminati <alessandro.carminati@xxxxxxxxx> > > +# > > +# kas_alias: Adds alias to duplicate symbols for the kallsyms output. > > + > > +import os > > +import re > > +import sys > > +import inspect > > +import argparse > > +import subprocess > > +from enum import Enum > > +from collections import namedtuple > > + > > +# Regex representing symbols that need no alias > > +regex_filter = [ > > + "^__compound_literal\\.[0-9]+$", > > + "^__[wm]*key\\.[0-9]+$", > > + "^_*TRACE_SYSTEM.*$", > > + "^__already_done\\.[0-9]+$", > > + "^__msg\\.[0-9]+$", > > + "^__func__\\.[0-9]+$", > > + "^CSWTCH\\.[0-9]+$", > > + "^_rs\\.[0-9]+$", > > + "^___tp_str\\.[0-9]+$", > > + "^__flags\\.[0-9]+$", > > + "^___done\\.[0-9]+$", > > + "^__print_once\\.[0-9]+$", > > + "^___once_key\\.[0-9]+$", > > + "^__pfx_.*$", > > + "^__cfi_.*$", > > + "^\\.LC[0-9]+$", > > + "^\\.L[0-9]+.[0-9]+$", > > + "^__UNIQUE_ID_.*$", > > + "^symbols\\.[0-9]+$", > > + "^_note_[0-9]+$" > > + ] > > + > > +class DebugLevel(Enum): > > + PRODUCTION = 0 > > + INFO = 1 > > + DEBUG_BASIC = 2 > > + DEBUG_MODULES = 3 > > + DEBUG_ALL = 4 > > > Re-inventing a wheel? > > Python supports the "logging" module. > > > > > > > > + > > +class SeparatorType: > > + def __call__(self, separator): > > + if len(separator) != 1: > > + raise argparse.ArgumentTypeError("Separator must be a single character") > > + return separator > > + > > +Line = namedtuple('Line', ['address', 'type', 'name', 'addr_int']) > > + > > +def get_caller(): > > + """ > > + Used only to produce debug messages: > > + Gets the caller's caller name if any, "kas_alias" otherwise > > + Args: > > + None > > + Returns: > > + A string representing a name of a function. > > + """ > > + stack = inspect.stack() > > + if len(stack) > 2: > > + caller = stack[2][0] > > + caller_name = caller.f_code.co_name > > + return caller_name > > + else: > > + return "kas_alias" > > + > > +def debug_print(config, print_debug_level, text): > > + """ > > + Prints text if current debug level is greater or equal to print_debug_level. > > + Args: > > + current_debug_level: Application debug level specified by command line. > > + print_debug_level: Minimum debug level message should be printed. > > + text: string representing the message. > > + Returns: > > + Nothing. > > + """ > > + if int(config.debug) >= print_debug_level: > > + print(f"{get_caller()}: " + text) > > + > > +def parse_nm_lines(config, lines, name_occurrences=None): > > + """ > > + Parses a given nm output and returns the symbol list along with a hash of > > + symbol occurrences. > > + Args: > > + lines: List of tuples representing one nm line. > > + name_occurrences: Hash having the name as key, used to count names' > > + occurrences. > > + Returns: > > + Creates a new line list proper for the nm output it parsed and, updates > > + the occurrences hash. > > + """ > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, "parse_nm_lines: parse start") > > + > > + if name_occurrences is None: > > + name_occurrences = {} > > + > > + symbol_list = [] > > + > > + for line in lines: > > + fields = line.strip().split() > > + > > + if len(fields) >= 3: > > + address, type, name = fields[0], fields[1], ' '.join(fields[2:]) > > + symbol_list.append(Line(address, type, name, int(address, 16))) > > + name_occurrences[name] = name_occurrences.get(name, 0) + 1 > > + > > + return symbol_list, name_occurrences > > + > > +def start_addr2line_process(binary_file, config): > > + """ > > + Initializes an addr2line server process operating on the given ELF object. > > + Args: > > + binary_file: String representing the binary file name object of addr2line > > + queries. > > + addr2line_file: String representing the addr2line executable name. > > + Returns: > > + Returns addr2line process descriptor. > > + """ > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"Starting addr2line process on {binary_file}") > > + > > + try: > > + addr2line_process = subprocess.Popen([config.addr2line_file, '-fe', > > + binary_file], > > + stdin=subprocess.PIPE, > > + stdout=subprocess.PIPE, > > + stderr=subprocess.PIPE, > > + text=True) > > + return addr2line_process > > + except Exception as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + > > +def addr2line_fetch_address(config, addr2line_process, address): > > + """ > > + Queries a specific address using the active addr2line process. > > + Args: > > + addr2line_process: Descriptor of the addr2line process that is wanted to > > + handle the query. > > + address: The address of the symbol that needs to be resolved. > > + Returns: > > + Returns a string representing the file and line number where the symbol > > + at the specified address has been defined. The address is normalized > > + before being returned. > > + """ > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"Resolving {address}") > > + > > + try: > > + addr2line_process.stdin.write(address + '\n') > > + addr2line_process.stdin.flush() > > + addr2line_process.stdout.readline().strip() > > + output = addr2line_process.stdout.readline().strip() > > + > > + return os.path.normpath(output) > > + except Exception as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + > > +def process_line(line, config, section_map): > > + """ > > + Determines whether a duplicate item requires an alias or not. > > + Args: > > + line: nm line object that needs to be checked. > > + section_map: map correlating symbols and the ELF section they are from > > + Returns: > > + Returns true if the line needs to be processed, false otherwise. > > + """ > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"Processing {line.address} {line.type} {line.name}") > > + > > + # The module contains symbols that were discarded after being loaded. Typically, > > + # these symbols belong to the initialization function. These symbols have their > > + # address in the init section addresses, so this check prevents these symbols > > + # from being assigned aliases. > > + if section_map != None: > > + if line.name in section_map: > > + if (".init" in section_map[line.name] or ".exit" in section_map[line.name]): > > + return False > > + else: > > + return False > > + > > + if config.process_data_sym: > > + return not (any(re.match(regex, line.name) for regex in regex_filter)) > > + else: > > + return (line.type in {"T", "t"}) and ( > > + not (any(re.match(regex, line.name) for regex in regex_filter))) > > + > > +def fetch_file_lines(config, filename): > > + """ > > + Reads a text file and retrieves its content. > > + Args: > > + filename: String representing the name of the file that needs to be read. > > + Returns: > > + Returns a string list representing the lines read in the file. > > + """ > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"Fetch {filename}") > > + > > + try: > > + with open(filename, 'r') as file: > > + lines = [line.strip() for line in file.readlines()] > > + return lines > > + except FileNotFoundError: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(FileNotFoundError).__name__}): {str(FileNotFoundError)}") > > + sys.exit(-2) > > + > > +def do_nm(filename, config): > > + """ > > + Runs the nm command on a specified file. > > + Args: > > + filename: String representing the name of the file on which nm should > > + run against. > > + nm_executable: String representing the nm executable filename. > > + Returns: > > + Returns a strings list representing the nm output. > > + """ > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"executing {config.nm_file} -n {filename}") > > + > > + try: > > + nm_output = subprocess.check_output([config.nm_file, '-n', filename], > > + universal_newlines=True, stderr=subprocess.STDOUT).splitlines() > > + return nm_output > > + except subprocess.CalledProcessError as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + > > +def make_objcpy_arg(config, line, decoration, section_map): > > + """ > > + Produces an objcopy argument statement for a single alias to be added in a > > + module. > > + Args: > > + line: nm line object target for this iteration. > > + decoration: String representing the decoration (normalized addr2line > > + output) to be added at the symbol name to have the alias. > > + section_map: map correlating symbols and the ELF section they are from > > + Returns: > > + Returns a string that directly maps the argument string objcopy should > > + use to add the alias. > > + """ > > + try: > > + flag = "global" if line.type.isupper() else "local" > > + debug_print(config, DebugLevel.DEBUG_MODULES.value, > > + f"{line.name + decoration}={section_map[line.name]}:0x{line.address},{flag}") > > + return ( > > + "--add-symbol " > > + f"{line.name + decoration}={section_map[line.name]}:0x{line.address},{flag} " > > + ) > > + except Exception: > > + debug_print(config, DebugLevel.PRODUCTION.value, > > + f"make_objcpy_arg warning: Skip alias for {line.name}" > > + f" type {line.type} because no corresponding section found.") > > + return "" > > + > > +def execute_objcopy(config, objcopy_args, object_file): > > + """ > > + Uses objcopy to add aliases to a given module object file. > > + Since objcopy can't operate in place, the original object file is renamed > > + before operating on it. At function end, a new object file having the old > > + object's name is carrying the aliases for the duplicate symbols. > > + Args: > > + objcopy_executable: String representing the object copy executable file. > > + objcopy_args: Arguments (aliases to add to the object file) to be used > > + in the objcopy execution command line. > > + object_file: Target object file (module object file) against which objcopy is executed. > > + Returns: > > + Nothing is returned, but as a side effect of this function execution, > > + the module's object file contains the aliases for duplicated symbols. > > + """ > > + # Rename the original object file by adding a suffix > > + backup_file = object_file + '.orig' > > + debug_print(config, DebugLevel.DEBUG_MODULES.value, f"rename {object_file} to {backup_file}") > > + os.rename(object_file, backup_file) > > + > > + full_command = ( > > + f"{config.objcopy_file} " > > + f"{objcopy_args} {backup_file} {object_file}" > > + ) > > + debug_print(config, DebugLevel.DEBUG_MODULES.value, f"executing {full_command}") > > + > > + try: > > + subprocess.run(full_command, shell=True, check=True) > > + except subprocess.CalledProcessError as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + > > +def generate_decoration(line, config, addr2line_process): > > + """ > > + Generates symbol decoration to be used to make the alias name, by > > + querying addr2line. > > + Args: > > + line: nm line object that needs an alias. > > + config: Object containing command line configuration. > > + addr2line_process: Descriptor of the addr2line process that serves > > + the binary object where the symbol belongs. > > + Returns: > > + Returns a string representing the decoration for the given symbol, > > + or empty string if this can not be done. E.g., addr2line can't find > > + the point where the symbol is defined. > > + """ > > + output = addr2line_fetch_address(config, addr2line_process, line.address) > > + base_dir = config.linux_base_dir + "/" > > + cwd = os.getcwd() + "/" > > + absolute_base_dir = os.path.abspath(os.path.join(cwd, base_dir)) > > + > > + if output.startswith(base_dir): > > + output = output[len(base_dir):] > > + > > + if output.startswith(absolute_base_dir): > > + output = output[len(absolute_base_dir):] > > + > > + if output.startswith('/'): > > + output = output[1:] > > + > > + decoration = config.separator + "".join( > > + "_" if not c.isalnum() else c for c in output > > + ) > > + # The addr2line can emit the special string "?:??" when addr2line can not find the > > + # specified address in the DWARF section that after normalization it becomes "____". > > + # In such cases, emitting an alias wouldn't make sense, so it is skipped. > > + if decoration != config.separator + "____": > > + return decoration > > + return "" > > + > > +def section_interesting(section): > > + """ > > + checks if a section is of interest. > > + Args: > > + section: string representing the section needed to be tested. > > + Returns: > > + True if it is, False otherwise. > > + """ > > + sections_regex = [r".text", r".data", r".bss", r".rodata"] > > + > > + for pattern in sections_regex: > > + if re.search(pattern, section): > > + return True > > + return False > > + > > +def get_symbol2section(config, file_to_operate): > > + """ > > + This function aims to produce a map{symbol_name]=section_name for > > + any given object file. > > + Args: > > + objdump_executable: String representing the objdump executable. > > + file_to_operate: file whose section names are wanted. > > + Returns: > > + Returns a map, where the key is the symbol name and the value is > > + a section name. > > + """ > > + try: > > + output = subprocess.check_output( > > + [config.objdump_file, '-h', file_to_operate], > > + universal_newlines=True) > > + section_pattern = re.compile(r'^ *[0-9]+ ([.a-z_]+) +([0-9a-f]+).*$', re.MULTILINE) > > + section_names = section_pattern.findall(output) > > + result = {} > > + for section, section_siza in section_names: > > + if int(section_siza, 16) != 0 and section_interesting(section): > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"CMD => {config.objdump_file} -tj {section} {file_to_operate}") > > + try: > > + output = subprocess.check_output( > > + [config.objdump_file, '-tj', section, file_to_operate], > > + universal_newlines=True) > > + except subprocess.CalledProcessError: > > + pass > > + func_names_pattern = re.compile(r'[0-9a-f]+.* ([.a-zA-Z_][.A-Za-z_0-9]+)$', re.MULTILINE) > > + matches = func_names_pattern.findall(output) > > + for func_name in matches: > > + result[func_name] = section > > + > > + > > + except Exception as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + return result > > + > > +def produce_output_modules(config, symbol_list, name_occurrences, > > + module_file_name, addr2line_process): > > + """ > > + Computes the alias addition on a given module object file. > > + Args: > > + config: Object containing command line configuration. > > + symbol_list: List of tuples representing nm lines for the given object > > + file. > > + name_occurrences: Hash that stores symbol occurrences for the build. > > + module_file_name: String representing the target moule object file. > > + addr2line_process: Descriptor of the addr2line process that is wanted to > > + handle the query. > > + Returns: > > + Nothing is returned, but as a side effect of this function execution, > > + the module's object file contains the aliases for duplicated symbols. > > + """ > > + debug_print(config, DebugLevel.DEBUG_ALL.value, "produce_output_modules computation starts here ") > > + objcopy_args = ""; > > + args_cnt = 0 > > + section_map = get_symbol2section(config, module_file_name) > > + for module_symbol in symbol_list: > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"--> Processing {module_symbol}") > > + try: > > + if (name_occurrences[module_symbol.name] > 1) and process_line(module_symbol, config, section_map): > > + decoration = generate_decoration(module_symbol, config, addr2line_process) > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"--- {module_symbol} occurred multiple times and is a candidate for alias: decoration '{decoration}'") > > + if decoration != "": > > + objcopy_args = objcopy_args + make_objcpy_arg(config, module_symbol, decoration, section_map) > > + args_cnt = args_cnt + 1 > > + if args_cnt > 50: > > + debug_print(config, DebugLevel.DEBUG_MODULES.value, "Number of arguments high, split objcopy" > > + " call into multiple statements.") > > + execute_objcopy(config, objcopy_args, module_file_name) > > + args_cnt = 0> + objcopy_args = "" > > + except KeyError: > > + pass > > + execute_objcopy(config, objcopy_args, module_file_name) > > > > So, you call execute_objcopy() to do > "rename <mod> <mod>.orig" and "objcopy <mod>.orig <mod>" > even when no alias is inserted? > > > > > > > > + > > +def produce_output_vmlinux(config, symbol_list, name_occurrences, addr2line_process): > > + """ > > + Computes the alias addition for the core Linux on image. > > + Args: > > + config: Object containing command line configuration. > > + symbol_list: List of tuples representing nm lines for the given object > > + file. > > + name_occurrences: Hash that stores symbol occurreces for the build. > > + addr2line_process: Descriptor of the addr2line process that is wanted to > > + handle the query. > > + Returns: > > + Nothing is returned, but as a side effect of this function execution, > > + the core kernel image contains the aliases for duplicated symbols. > > + """ > > + with open(config.output_file, 'w') as output_file: > > + for obj in symbol_list: > > + output_file.write(f"{obj.address} {obj.type} {obj.name}\n") > > + if (name_occurrences[obj.name] > 1) and process_line(obj, config, None): > > + decoration = generate_decoration(obj, config, addr2line_process) > > + debug_print(config,DebugLevel.DEBUG_ALL.value, f"Symbol {obj.name} appears multiple times, and decoration is {decoration}") > > + if decoration != "": > > + debug_print(config, DebugLevel.DEBUG_ALL.value, f"Writing on {config.output_file} the additional '{obj.address} {obj.type} {obj.name + decoration}'") > > + output_file.write(f"{obj.address} {obj.type} {obj.name + decoration}\n") > > + > > +def read_name_occurrences(config): > > + """ > > + Reads symbol frequencies from the file specified in the 'config' argument. > > + > > + If the file is not found, it gracefully returns an empty map. > > + > > + Args: > > + config: A configuration object or dictionary containing necessary information > > + to locate the file. > > + > > + Returns: > > + A map where keys represent symbol names and values represent their frequencies. > > + """ > > + name_occurrences = {} > > + # This code reads occurrences of symbol names from a file containing both the core image > > + # and modules frequencies resulted from the computation of the "core_image" action. > > + # It reads from the file specified by command-line arguments; if the file doesn't exist > > + # or the filename isn't specified, it returns an empty map. > > + # The code relies on accessing and reading config.symbol_frequency_file containing > > + # symbol name frequencies. > > + # In a complete build, this file is generated during the "core image" action earlier > > + # in the build process. > > + # However, when building a custom OOT module, it is needed to ensure that this file > > + # is accessible in the current directory where the module source code is being built. > > + # Not having this file result in a module that have no aliases even if they are needed > > + if config.symbol_frequency_file is not None: > > + try: > > + with open(config.symbol_frequency_file, 'r') as file: > > + for line in file: > > + key, value = line.strip().split(':') > > + name_occurrences[key]=int(value) > > + except FileNotFoundError: > > + pass > > + > > + return name_occurrences > > + > > +def check_aliases(config, module_nm_lines): > > + """ > > + Flags modules that already have aliases based on the given 'module_nm_lines'. > > + > > + This function takes in configuration details and a list of strings representing > > + the 'nm' command output for a specific module. It detects instances where a module > > + already possesses an alias, which might occur after a build interruption and restart. > > + > > + The detection logic is straightforward: it examines if the separator character is > > + present in the symbol name. If found, it uses this separator to check if the > > + previous symbol shares the same name. This detection assumes 'nm' is invoked with > > + the '-n' flag, ensuring symbol sorting. > > + > > + Args: > > + config: A configuration object or dictionary containing necessary information. > > + module_nm_lines: A list of strings representing the 'nm' command output for a module. > > + > > + Returns: > > + True if the module_nm_lines contains aliases, False otherwise. > > + """ > > + prev = None > > + for line in module_nm_lines: > > + if ('@' in line.name and line.name.split(config.separator)[0] == prev): > > + return False > > + prev = line.name > > + return True > > + > > +def main(): > > + # Handles command-line arguments and generates a config object > > + parser = argparse.ArgumentParser(description='Add alias to multiple occurring symbols name in kallsyms') > > + subparsers = parser.add_subparsers(title='Subcommands', dest='action') > > + core_image_parser = subparsers.add_parser('core_image', help='Operates for in tree computation.') > > + core_image_parser.add_argument('-n', "--nmdata", dest="nm_data_file", required=True, help="Set vmlinux nm output file to use for core image.") > > + core_image_parser.add_argument('-o', "--outfile", dest="output_file", required=True, help="Set the vmlinux nm output file containing aliases.") > > + core_image_parser.add_argument('-v', "--vmlinux", dest="vmlinux_file", required=True, help="Set the vmlinux core image file.") > > + core_image_parser.add_argument('-m', "--modules_list", dest="module_list", required=True, help="Set the file containing the list of the modules object files.") > > + > > + single_module_parser = subparsers.add_parser('single_module', help='Operates for out of tree computation.') > > + single_module_parser.add_argument('-c', "--objcopy", dest="objcopy_file", required=True, help="Set the objcopy executable to be used.") > > + single_module_parser.add_argument('-u', "--objdump", dest="objdump_file", required=True, help="Set objdump executable to be used.") > > > Usually dest= should be omitted. > > You can access config.objcopy, config.objdumpm, > instead of config.objcopy_file, config_objdump_file. > > > > > > > > + single_module_parser.add_argument('-q', "--target-module", dest="target_module", required=False, help="Sets a tharget module to operate.") > > + > > + parser.add_argument('-j', "--symbol_frequency", dest="symbol_frequency_file", required=True, help="Specify the symbol frequency needed to use for producing aliases") > > + parser.add_argument('-z', "--debug", dest="debug", required=False, help="Set the debug level.", choices=[f"{level.value}" for level in DebugLevel], default="1" ) > > + parser.add_argument('-a', "--addr2line", dest="addr2line_file", required=True, help="Set the addr2line executable to be used.") > > + parser.add_argument('-b', "--basedir", dest="linux_base_dir", required=True, help="Set base directory of the source kernel code.") > > + parser.add_argument('-s', "--separator", dest="separator", required=False, help="Set separator, character that separates original name from the addr2line data in alias symbols.", default="@", type=SeparatorType()) > > + parser.add_argument('-d', "--process_data", dest="process_data_sym", required=False, help="Requires the tool to process data symbols along with text symbols.", action='store_true') > > + parser.add_argument('-e', "--nm", dest="nm_file", required=True, help="Set the nm executable to be used.") > > + > > + config = parser.parse_args() > > + > > + try: > > + # The core_image target is utilized for gathering symbol statistics from the core image and modules, > > + # generating aliases for the core image. This target is designed to be invoked from scripts/link-vmlinux.sh > > + if config.action == 'core_image': > > + debug_print(config, DebugLevel.INFO.value,"Start core_image processing") > > + > > + # Determine kernel source code base directory > > + if not config.linux_base_dir.startswith('/'): > > + config.linux_base_dir = os.path.normpath(os.getcwd() + "/" + config.linux_base_dir) + "/" > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"Configuration: {config}") > > + > > + debug_print(config, DebugLevel.INFO.value, "Process nm data from vmlinux") > > + # Process nm data from vmlinux > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"fetch_file_lines({config.nm_data_file})") > > + vmlinux_nm_lines = fetch_file_lines(config, config.nm_data_file) > > + vmlinux_symbol_list, name_occurrences = parse_nm_lines(config, vmlinux_nm_lines) > > + > > + debug_print(config, DebugLevel.INFO.value,"Process nm data for modules") > > + # Process nm data for modules > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"fetch_file_lines({config.nm_data_file})") > > + module_list = fetch_file_lines(config, config.module_list) > > + module_symbol_list = {} > > + for module in module_list: > > + module_nm_lines = do_nm(module, config) > > + module_symbol_list[module], name_occurrences = parse_nm_lines(config, module_nm_lines, name_occurrences) > > + > > + debug_print(config, DebugLevel.INFO.value, f"Save name_occurrences data: {config.symbol_frequency_file}") > > + with open(config.symbol_frequency_file, 'w') as file: > > + for key, value in name_occurrences.items(): > > + file.write(f"{key}:{value}\n") > > + > > + debug_print(config, DebugLevel.INFO.value, "Produce file for vmlinux") > > + # Produce file for vmlinux > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"addr2line_process({config.vmlinux_file}, {config.addr2line_file})") > > + addr2line_process = start_addr2line_process(config.vmlinux_file, config) > > + produce_output_vmlinux(config, vmlinux_symbol_list, name_occurrences, addr2line_process) > > + addr2line_process.stdin.close() > > + addr2line_process.stdout.close() > > + addr2line_process.stderr.close() > > + addr2line_process.wait() > > + > > + # Expects to be called from scripts/Makefile.modfinal > > + elif config.action == 'single_module': > > + debug_print(config, DebugLevel.INFO.value,"Start single_module processing") > > + # read simbol name frequency file > > + name_occurrences = read_name_occurrences(config) > > + # scan current module > > + module_nm_lines = do_nm(config.target_module, config) > > + mudule_nm_data, _ = parse_nm_lines(config, module_nm_lines) > > + if check_aliases(config, mudule_nm_data,): > > + debug_print(config, DebugLevel.DEBUG_BASIC.value, f"addr2line_process({config.target_module}, {config.addr2line_file})") > > + addr2line_process = start_addr2line_process(config.target_module, config) > > + debug_print(config, DebugLevel.DEBUG_BASIC.value,"adding aliases to module") > > + produce_output_modules(config, mudule_nm_data, name_occurrences, config.target_module, addr2line_process) > > + addr2line_process.stdin.close() > > + addr2line_process.stdout.close() > > + addr2line_process.stderr.close() > > + addr2line_process.wait() > > + else: > > + debug_print(config, DebugLevel.INFO.value,"module is already aliased, skipping") > > + > > + else: > > + raise SystemExit("Script terminated: unknown action") > > + > > + except Exception as e: > > + debug_print(config, DebugLevel.PRODUCTION.value, f"Script terminated due to an error ({type(e).__name__}): {str(e)}") > > + sys.exit(-2) > > + > > +if __name__ == "__main__": > > + main() > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > > index a432b171be82..31afac64d7ed 100755 > > --- a/scripts/link-vmlinux.sh > > +++ b/scripts/link-vmlinux.sh > > @@ -91,7 +91,12 @@ vmlinux_link() > > > > # The kallsyms linking does not need debug symbols included. > > if [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then > > - ldflags="${ldflags} ${wl}--strip-debug" > > + # The kallsyms linking does not need debug symbols included, > > + # unless the KALLSYMS_ALIAS_SRCLINE. > > + if ! is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE && \ > > + [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then > > > This is the same condition as checked 5 lines above, isn't it? > > > > > > > + ldflags="${ldflags} ${wl}--strip-debug" > > + fi > > fi > > > > if is_enabled CONFIG_VMLINUX_MAP; then > > @@ -161,7 +166,24 @@ kallsyms() > > fi > > > > info KSYMS ${2} > > - scripts/kallsyms ${kallsymopt} ${1} > ${2} > > + ALIAS="" > > + KAS_DATA="" > > + if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE_DATA; then > > + KAS_DATA="--process_data" > > + fi > > + if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE; then > > + ALIAS=".alias" > > + # You can use KAS_ALIAS_DEBUG=<debug level> in the make statements to enable > > + # verbose execution for kas_alias. > > + ${srctree}/scripts/kas_alias.py --symbol_frequency modules.symbfreq \ > > + --debug ${KAS_ALIAS_DEBUG:-0} \ > > + --addr2line ${ADDR2LINE} --basedir ${srctree} --nm ${NM} \ > > + --separator @ ${KAS_DATA} \ > > + core_image \ > > + --modules_list ${MODORDER} --vmlinux ${kallsyms_vmlinux} \ > > + --nmdata ${1} --outfile ${1}${ALIAS} > > + fi > > + scripts/kallsyms ${kallsymopt} ${1}${ALIAS} > ${2} > > > > I do not understand your quoting policy. > > In shell, there is no need to quote literals, > need to quote variables. > > You are doing the opposite. > > KAS_DATA="--process_data" > --> KAS_DATA=--process_data > > > > ${ADDR2LINE} --> "${ADDR2LINE}" > > > > > > > > } > > > > # Perform one step in kallsyms generation, including temporary linking of > > @@ -203,6 +225,7 @@ cleanup() > > rm -f System.map > > rm -f vmlinux > > rm -f vmlinux.map > > + find . -type f -name "*.orig" -exec rm {} \; > > > I am upset with deleting files with a well known suffix like this. > > When the 'patch' command fails, *.orig and *.rej will be created. > > > > > > > > > } > > > > # Use "make V=1" to debug this script > > -- > > 2.34.1 > > > > > -- > Best Regards > Masahiro Yamada Best Regards Alessandro Carminati