Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 27 Sep 2023 17:35:16 +0000
"Alessandro Carminati (Red Hat)" <alessandro.carminati@xxxxxxxxx> wrote:

> It is not uncommon for drivers or modules related to similar peripherals
> to have symbols with the exact same name.
> While this is not a problem for the kernel's binary itself, it becomes an
> issue when attempting to trace or probe specific functions using
> infrastructure like ftrace or kprobe.
> 
> The tracing subsystem relies on the `nm -n vmlinux` output, which provides
> symbol information from the kernel's ELF binary. However, when multiple
> symbols share the same name, the standard nm output does not differentiate
> between them. This can lead to confusion and difficulty when trying to
> probe the intended symbol.
> 
>  ~ # cat /proc/kallsyms | grep " name_show"
>  ffffffff8c4f76d0 t name_show
>  ffffffff8c9cccb0 t name_show
>  ffffffff8cb0ac20 t name_show
>  ffffffff8cc728c0 t name_show
>  ffffffff8ce0efd0 t name_show
>  ffffffff8ce126c0 t name_show
>  ffffffff8ce1dd20 t name_show
>  ffffffff8ce24e70 t name_show
>  ffffffff8d1104c0 t name_show
>  ffffffff8d1fe480 t name_show
> 
> kas_alias addresses this challenge by enhancing symbol names with
> meaningful suffixes generated from the source file and line number
> during the kernel build process.
> These newly generated aliases provide tracers with the ability to
> comprehend the symbols they are interacting with when utilizing the
> ftracefs interface.
> This approach may also allow for the probing by name of previously
> inaccessible symbols.
> 
>  ~ # cat /proc/kallsyms | grep gic_mask_irq
>  ffffd15671e505ac t gic_mask_irq
>  ffffd15671e505ac t gic_mask_irq@drivers_irqchip_irq_gic_c_167
>  ffffd15671e532a4 t gic_mask_irq
>  ffffd15671e532a4 t gic_mask_irq@drivers_irqchip_irq_gic_v3_c_407
>  ~ #
> 
> Changes from v1:
> - Integrated changes requested by Masami to exclude symbols with prefixes
>   "_cfi" and "_pfx".
> - Introduced a small framework to handle patterns that need to be excluded
>   from the alias production.
> - Excluded other symbols using the framework.
> - Introduced the ability to discriminate between text and data symbols.
> - Added two new config symbols in this version: CONFIG_KALLSYMS_ALIAS_DATA,
>   which allows data for data, and CONFIG_KALLSYMS_ALIAS_DATA_ALL, which
>   excludes all filters and provides an alias for each duplicated symbol.
> 
> https://lore.kernel.org/all/20230711151925.1092080-1-alessandro.carminati@xxxxxxxxx/
> 
> Changes from v2:
> - Alias tags are created by querying DWARF information from the vmlinux.
> - The filename + line number is normalized and appended to the original
>   name.
> - The tag begins with '@' to indicate the symbol source.
> - Not a change, but worth mentioning, since the alias is added to the
>   existing list, the old duplicated name is preserved, and the livepatch
>   way of dealing with duplicates is maintained.
> - Acknowledging the existence of scenarios where inlined functions
>   declared in header files may result in multiple copies due to compiler
>   behavior, though it is not actionable as it does not pose an operational
>   issue.
> - Highlighting a single exception where the same name refers to different
>   functions: the case of "compat_binfmt_elf.c," which directly includes
>   "binfmt_elf.c" producing identical function copies in two separate
>   modules.
> 
> https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@xxxxxxxxx/
> 
> Changes from v3:
> - kas_alias was rewritten in Python to create a more concise and
>   maintainable codebase.
> - The previous automation process used by kas_alias to locate the vmlinux
>   and the addr2line has been replaced with an explicit command-line switch
>   for specifying these requirements.
> - addr2line has been added into the main Makefile.
> - A new command-line switch has been introduced, enabling users to extend
>   the alias to global data names.
> 
> https://lore.kernel.org/all/20230828080423.3539686-1-alessandro.carminati@xxxxxxxxx/
> 
> Changes from v4:
> - Fixed the O=<build dir> build issue
> - The tool halts execution upon encountering major issues, thereby ensuring
>   the pipeline is interrupted.
> - A cmdline option to specify the source directory added.
> - Minor code adjusments.
> - Tested on mips32 and i386
> 
> https://lore.kernel.org/all/20230919193948.465340-1-alessandro.carminati@xxxxxxxxx/
> 
> NOTE:
> About the symbols name duplication that happens as consequence of the
> inclusion compat_binfmt_elf.c does, it is evident that this corner is
> inherently challenging the addr2line approach.
> Attempting to conceal this limitation would be counterproductive.
> 
> compat_binfmt_elf.c includes directly binfmt_elf.c, addr2line can't help
> but report all functions and data declared by that file, coming from
> binfmt_elf.c.
> 
> My position is that, rather than producing a more complicated pipeline
> to handle this corner case, it is better to fix the compat_binfmt_elf.c
> anomaly.
> 
> This patch does not deal with the two potentially problematic symbols
> defined by compat_binfmt_elf.c

Thanks for update! I confirmed that the suffixes on the same-name symbols.
But I have one comment below;

> 
> Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@xxxxxxxxx>
> ---
>  Makefile                |   4 +-
>  init/Kconfig            |  22 +++++++
>  scripts/kas_alias.py    | 136 ++++++++++++++++++++++++++++++++++++++++
>  scripts/link-vmlinux.sh |  21 ++++++-
>  4 files changed, 180 insertions(+), 3 deletions(-)
>  create mode 100755 scripts/kas_alias.py
> 
> diff --git a/Makefile b/Makefile
> index 4f283d915e54..f33c179f4cc3 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -488,6 +488,7 @@ OBJCOPY		= $(LLVM_PREFIX)llvm-objcopy$(LLVM_SUFFIX)
>  OBJDUMP		= $(LLVM_PREFIX)llvm-objdump$(LLVM_SUFFIX)
>  READELF		= $(LLVM_PREFIX)llvm-readelf$(LLVM_SUFFIX)
>  STRIP		= $(LLVM_PREFIX)llvm-strip$(LLVM_SUFFIX)
> +ADDR2LINE	= $(LLVM_PREFIX)llvm-addr2line$(LLVM_SUFFIX)
>  else
>  CC		= $(CROSS_COMPILE)gcc
>  LD		= $(CROSS_COMPILE)ld
> @@ -497,6 +498,7 @@ OBJCOPY		= $(CROSS_COMPILE)objcopy
>  OBJDUMP		= $(CROSS_COMPILE)objdump
>  READELF		= $(CROSS_COMPILE)readelf
>  STRIP		= $(CROSS_COMPILE)strip
> +ADDR2LINE	= $(CROSS_COMPILE)addr2line
>  endif
>  RUSTC		= rustc
>  RUSTDOC		= rustdoc
> @@ -611,7 +613,7 @@ export RUSTC_BOOTSTRAP := 1
>  export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC HOSTPKG_CONFIG
>  export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN CARGO
>  export HOSTRUSTC KBUILD_HOSTRUSTFLAGS
> -export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
> +export CPP AR NM STRIP OBJCOPY OBJDUMP READELF ADDR2LINE PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
>  export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX
>  export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD
>  export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> diff --git a/init/Kconfig b/init/Kconfig
> index 6d35728b94b2..d45dd423e1ec 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1738,6 +1738,28 @@ config KALLSYMS_BASE_RELATIVE
>  	  time constants, and no relocation pass is required at runtime to fix
>  	  up the entries based on the runtime load address of the kernel.
>  
> +config KALLSYMS_ALIAS_SRCLINE
> +	bool "Produces alias for duplicated text symbols" if EXPERT
> +	depends on KALLSYMS && DEBUG_INFO && !DEBUG_INFO_SPLIT
> +	help
> +	  It is not uncommon for drivers or modules related to similar
> +	  peripherals to have symbols with the exact same name.
> +	  While this is not a problem for the kernel's binary itself, it
> +	  becomes an issue when attempting to trace or probe specific
> +	  functions using infrastructure like ftrace or kprobe.
> +
> +	  This option addresses this challenge, producing alias for text
> +	  symbol names that include the file name and line where the symbols
> +	  are defined in the source code.
> +
> +config KALLSYMS_ALIAS_SRCLINE_DATA
> +	bool "Produces alias also for global variables names"
> +	depends on KALLSYMS_ALIAS_SRCLINE
> +	help
> +	  Sometimes it can be useful to refer to global vars by name. Since
> +	  they suffer the same issue as text symbols, this config option
> +	  allows having aliases for global variables names too.
> +
>  # end of the "standard kernel features (expert users)" menu
>  
>  # syscall, maps, verifier
> diff --git a/scripts/kas_alias.py b/scripts/kas_alias.py
> new file mode 100755
> index 000000000000..7c6b7045081c
> --- /dev/null
> +++ b/scripts/kas_alias.py
> @@ -0,0 +1,136 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Copyright (C) 2023 Red Hat, Inc. Alessandro Carminati <alessandro.carminati@xxxxxxxxx>
> +#
> +# kas_alias: Adds alias to duplicate symbols in the kallsyms output.
> +
> +import subprocess
> +import sys
> +import os
> +import argparse
> +import re
> +from collections import namedtuple
> +
> +regex_filter = [
> +        "^__compound_literal\\.[0-9]+$",
> +        "^__[wm]*key\\.[0-9]+$",
> +        "^_*TRACE_SYSTEM.*$",
> +        "^__already_done\\.[0-9]+$",
> +        "^__msg\\.[0-9]+$",
> +        "^__func__\\.[0-9]+$",
> +        "^CSWTCH\\.[0-9]+$",
> +        "^_rs\\.[0-9]+$",
> +        "^___tp_str\\.[0-9]+$",
> +        "^__flags\\.[0-9]+$",
> +        "^___done\\.[0-9]+$",
> +        "^__print_once\\.[0-9]+$",
> +        "^___once_key\\.[0-9]+$",
> +        "^__pfx_.*$",
> +        "^__cfi_.*$"
> +        ]

I saw __pfx_ symbols have suffixes too, this is because this filter is
only applied when the kas_alias is enabled for data symbols .

> +
> +class SeparatorType:
> +    def __call__(self, separator):
> +        if len(separator) != 1:
> +            raise argparse.ArgumentTypeError("Separator must be a single character")
> +        return separator
> +
> +Line = namedtuple('Line', ['address', 'type', 'name'])
> +
> +def parse_file(filename):
> +    symbol_list = []
> +    name_occurrences = {}
> +
> +    with open(filename, 'r') as file:
> +        for line in file:
> +            fields = line.strip().split()
> +
> +            if len(fields) >= 3:
> +                address, type, name = fields[0], fields[1], ' '.join(fields[2:])
> +                symbol_list.append(Line(address, type, name))
> +                name_occurrences[name] = name_occurrences.get(name, 0) + 1
> +
> +    return symbol_list, name_occurrences
> +
> +def find_duplicate(symbol_list, name_occurrences):
> +    name_to_lines = {}
> +    duplicate_lines = []
> +
> +    for line in symbol_list:
> +        if line.name in name_to_lines:
> +            first_occurrence = name_to_lines[line.name]
> +            duplicate_lines.extend([first_occurrence, line])
> +        else:
> +            name_to_lines[line.name] = line
> +
> +    return duplicate_lines
> +
> +def start_addr2line_process(binary_file, addr2line_file):
> +    try:
> +        addr2line_process = subprocess.Popen([addr2line_file, '-fe', binary_file],
> +                                             stdin=subprocess.PIPE,
> +                                             stdout=subprocess.PIPE,
> +                                             stderr=subprocess.PIPE,
> +                                             text=True)
> +        return addr2line_process
> +    except Exception as e:
> +        print(f"Error starting addr2line process: {str(e)}")
> +        sys.exit(1)
> +
> +def addr2line_fetch_address(addr2line_process, address):
> +    try:
> +        addr2line_process.stdin.write(address + '\n')
> +        addr2line_process.stdin.flush()
> +        addr2line_process.stdout.readline().strip()
> +        output = addr2line_process.stdout.readline().strip()
> +
> +        return os.path.normpath(output)
> +    except Exception as e:
> +        print(f"Error communicating with addr2line: {str(e)}")
> +        sys.exit(1)
> +
> +def process_line(obj, config):
> +    if config:
> +        return not (any(re.match(regex, obj.name) for regex in regex_filter))
> +    else:
> +        return obj.type in {"T", "t"}

Here is the problem. The regex_filter check must be done for all text symbols
if config == False. (BTW, this 'config' is too generic name, 'text_only' will
be more readable.)

Thank you,

> +
> +if __name__ == "__main__":
> +    parser = argparse.ArgumentParser(description='Add alias to multiple occurring symbols name in kallsyms')
> +    parser.add_argument('-a', "--addr2line", dest="addr2line_file", required=True)
> +    parser.add_argument('-v', "--vmlinux", dest="vmlinux_file", required=True)
> +    parser.add_argument('-o', "--outfile", dest="output_file", required=True)
> +    parser.add_argument('-n', "--nmdata", dest="nm_data_file", required=True)
> +    parser.add_argument('-b', "--basedir", dest="linux_base_dir", required=True)
> +    parser.add_argument('-s', "--separator", dest="separator", required=False, default="@", type=SeparatorType())
> +    parser.add_argument('-d', "--data", dest="include_data", required=False, action='store_true')
> +    config = parser.parse_args()
> +
> +    try:
> +        config.linux_base_dir = os.path.normpath(os.getcwd() + "/" + config.linux_base_dir) + "/"
> +        symbol_list, name_occurrences = parse_file(config.nm_data_file)
> +        addr2line_process = start_addr2line_process(config.vmlinux_file, config.addr2line_file)
> +
> +        with open(config.output_file, 'w') as file:
> +            for obj in symbol_list:
> +                file.write(f"{obj.address} {obj.type} {obj.name}\n")
> +                if (name_occurrences[obj.name] > 1) and process_line(obj, config.include_data) :
> +                    output = addr2line_fetch_address(addr2line_process, obj.address)
> +                    decoration = config.separator + "".join(
> +                        "_" if not c.isalnum() else c for c in output.replace(config.linux_base_dir, "")
> +                    )
> +                    # The addr2line can emit the special string "?:??" when addr2line can not find the
> +                    # specified address in the DWARF section that after normalization it becomes "____".
> +                    # In such cases, emitting an alias wouldn't make sense, so it is skipped.
> +                    if decoration != config.separator + "____":
> +                        file.write(f"{obj.address} {obj.type} {obj.name + decoration}\n")
> +
> +        addr2line_process.stdin.close()
> +        addr2line_process.stdout.close()
> +        addr2line_process.stderr.close()
> +        addr2line_process.wait()
> +
> +    except Exception as e:
> +        print(f"An error occurred: {str(e)}")
> +        raise SystemExit("Script terminated due to an error")
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index a432b171be82..c110b0f58a19 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -91,7 +91,12 @@ vmlinux_link()
>  
>  	# The kallsyms linking does not need debug symbols included.
>  	if [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then
> -		ldflags="${ldflags} ${wl}--strip-debug"
> +		# The kallsyms linking does not need debug symbols included,
> +		# unless the KALLSYMS_ALIAS_SRCLINE.
> +		if ! is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE && \
> +		   [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then
> +			ldflags="${ldflags} ${wl}--strip-debug"
> +		fi
>  	fi
>  
>  	if is_enabled CONFIG_VMLINUX_MAP; then
> @@ -161,7 +166,19 @@ kallsyms()
>  	fi
>  
>  	info KSYMS ${2}
> -	scripts/kallsyms ${kallsymopt} ${1} > ${2}
> +	ALIAS=""
> +	KAS_DATA=""
> +	if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE_DATA; then
> +		KAS_DATA="--data"
> +	fi
> +	if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE; then
> +		ALIAS=".alias"
> +		${srctree}/scripts/kas_alias.py \
> +			--addr2line ${ADDR2LINE} --vmlinux ${kallsyms_vmlinux} \
> +			--nmdata ${1} --outfile ${1}${ALIAS} \
> +			--basedir ${srctree} --separator @ ${KAS_DATA}
> +	fi
> +	scripts/kallsyms ${kallsymopt} ${1}${ALIAS} > ${2}
>  }
>  
>  # Perform one step in kallsyms generation, including temporary linking of
> -- 
> 2.34.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>



[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux