Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Masami,

Thank you for your feedback.


Il giorno ven 29 set 2023 alle ore 13:28 Masami Hiramatsu
<mhiramat@xxxxxxxxxx> ha scritto:
>
> On Wed, 27 Sep 2023 17:35:16 +0000
> "Alessandro Carminati (Red Hat)" <alessandro.carminati@xxxxxxxxx> wrote:
>
> > It is not uncommon for drivers or modules related to similar peripherals
> > to have symbols with the exact same name.
> > While this is not a problem for the kernel's binary itself, it becomes an
> > issue when attempting to trace or probe specific functions using
> > infrastructure like ftrace or kprobe.
> >
> > The tracing subsystem relies on the `nm -n vmlinux` output, which provides
> > symbol information from the kernel's ELF binary. However, when multiple
> > symbols share the same name, the standard nm output does not differentiate
> > between them. This can lead to confusion and difficulty when trying to
> > probe the intended symbol.
> >
> >  ~ # cat /proc/kallsyms | grep " name_show"
> >  ffffffff8c4f76d0 t name_show
> >  ffffffff8c9cccb0 t name_show
> >  ffffffff8cb0ac20 t name_show
> >  ffffffff8cc728c0 t name_show
> >  ffffffff8ce0efd0 t name_show
> >  ffffffff8ce126c0 t name_show
> >  ffffffff8ce1dd20 t name_show
> >  ffffffff8ce24e70 t name_show
> >  ffffffff8d1104c0 t name_show
> >  ffffffff8d1fe480 t name_show
> >
> > kas_alias addresses this challenge by enhancing symbol names with
> > meaningful suffixes generated from the source file and line number
> > during the kernel build process.
> > These newly generated aliases provide tracers with the ability to
> > comprehend the symbols they are interacting with when utilizing the
> > ftracefs interface.
> > This approach may also allow for the probing by name of previously
> > inaccessible symbols.
> >
> >  ~ # cat /proc/kallsyms | grep gic_mask_irq
> >  ffffd15671e505ac t gic_mask_irq
> >  ffffd15671e505ac t gic_mask_irq@drivers_irqchip_irq_gic_c_167
> >  ffffd15671e532a4 t gic_mask_irq
> >  ffffd15671e532a4 t gic_mask_irq@drivers_irqchip_irq_gic_v3_c_407
> >  ~ #
> >
> > Changes from v1:
> > - Integrated changes requested by Masami to exclude symbols with prefixes
> >   "_cfi" and "_pfx".
> > - Introduced a small framework to handle patterns that need to be excluded
> >   from the alias production.
> > - Excluded other symbols using the framework.
> > - Introduced the ability to discriminate between text and data symbols.
> > - Added two new config symbols in this version: CONFIG_KALLSYMS_ALIAS_DATA,
> >   which allows data for data, and CONFIG_KALLSYMS_ALIAS_DATA_ALL, which
> >   excludes all filters and provides an alias for each duplicated symbol.
> >
> > https://lore.kernel.org/all/20230711151925.1092080-1-alessandro.carminati@xxxxxxxxx/
> >
> > Changes from v2:
> > - Alias tags are created by querying DWARF information from the vmlinux.
> > - The filename + line number is normalized and appended to the original
> >   name.
> > - The tag begins with '@' to indicate the symbol source.
> > - Not a change, but worth mentioning, since the alias is added to the
> >   existing list, the old duplicated name is preserved, and the livepatch
> >   way of dealing with duplicates is maintained.
> > - Acknowledging the existence of scenarios where inlined functions
> >   declared in header files may result in multiple copies due to compiler
> >   behavior, though it is not actionable as it does not pose an operational
> >   issue.
> > - Highlighting a single exception where the same name refers to different
> >   functions: the case of "compat_binfmt_elf.c," which directly includes
> >   "binfmt_elf.c" producing identical function copies in two separate
> >   modules.
> >
> > https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@xxxxxxxxx/
> >
> > Changes from v3:
> > - kas_alias was rewritten in Python to create a more concise and
> >   maintainable codebase.
> > - The previous automation process used by kas_alias to locate the vmlinux
> >   and the addr2line has been replaced with an explicit command-line switch
> >   for specifying these requirements.
> > - addr2line has been added into the main Makefile.
> > - A new command-line switch has been introduced, enabling users to extend
> >   the alias to global data names.
> >
> > https://lore.kernel.org/all/20230828080423.3539686-1-alessandro.carminati@xxxxxxxxx/
> >
> > Changes from v4:
> > - Fixed the O=<build dir> build issue
> > - The tool halts execution upon encountering major issues, thereby ensuring
> >   the pipeline is interrupted.
> > - A cmdline option to specify the source directory added.
> > - Minor code adjusments.
> > - Tested on mips32 and i386
> >
> > https://lore.kernel.org/all/20230919193948.465340-1-alessandro.carminati@xxxxxxxxx/
> >
> > NOTE:
> > About the symbols name duplication that happens as consequence of the
> > inclusion compat_binfmt_elf.c does, it is evident that this corner is
> > inherently challenging the addr2line approach.
> > Attempting to conceal this limitation would be counterproductive.
> >
> > compat_binfmt_elf.c includes directly binfmt_elf.c, addr2line can't help
> > but report all functions and data declared by that file, coming from
> > binfmt_elf.c.
> >
> > My position is that, rather than producing a more complicated pipeline
> > to handle this corner case, it is better to fix the compat_binfmt_elf.c
> > anomaly.
> >
> > This patch does not deal with the two potentially problematic symbols
> > defined by compat_binfmt_elf.c
>
> Thanks for update! I confirmed that the suffixes on the same-name symbols.
> But I have one comment below;
>
> >
> > Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@xxxxxxxxx>
> > ---
> >  Makefile                |   4 +-
> >  init/Kconfig            |  22 +++++++
> >  scripts/kas_alias.py    | 136 ++++++++++++++++++++++++++++++++++++++++
> >  scripts/link-vmlinux.sh |  21 ++++++-
> >  4 files changed, 180 insertions(+), 3 deletions(-)
> >  create mode 100755 scripts/kas_alias.py
> >
> > diff --git a/Makefile b/Makefile
> > index 4f283d915e54..f33c179f4cc3 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -488,6 +488,7 @@ OBJCOPY           = $(LLVM_PREFIX)llvm-objcopy$(LLVM_SUFFIX)
> >  OBJDUMP              = $(LLVM_PREFIX)llvm-objdump$(LLVM_SUFFIX)
> >  READELF              = $(LLVM_PREFIX)llvm-readelf$(LLVM_SUFFIX)
> >  STRIP                = $(LLVM_PREFIX)llvm-strip$(LLVM_SUFFIX)
> > +ADDR2LINE    = $(LLVM_PREFIX)llvm-addr2line$(LLVM_SUFFIX)
> >  else
> >  CC           = $(CROSS_COMPILE)gcc
> >  LD           = $(CROSS_COMPILE)ld
> > @@ -497,6 +498,7 @@ OBJCOPY           = $(CROSS_COMPILE)objcopy
> >  OBJDUMP              = $(CROSS_COMPILE)objdump
> >  READELF              = $(CROSS_COMPILE)readelf
> >  STRIP                = $(CROSS_COMPILE)strip
> > +ADDR2LINE    = $(CROSS_COMPILE)addr2line
> >  endif
> >  RUSTC                = rustc
> >  RUSTDOC              = rustdoc
> > @@ -611,7 +613,7 @@ export RUSTC_BOOTSTRAP := 1
> >  export ARCH SRCARCH CONFIG_SHELL BASH HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE LD CC HOSTPKG_CONFIG
> >  export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN CARGO
> >  export HOSTRUSTC KBUILD_HOSTRUSTFLAGS
> > -export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
> > +export CPP AR NM STRIP OBJCOPY OBJDUMP READELF ADDR2LINE PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
> >  export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX
> >  export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD
> >  export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 6d35728b94b2..d45dd423e1ec 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1738,6 +1738,28 @@ config KALLSYMS_BASE_RELATIVE
> >         time constants, and no relocation pass is required at runtime to fix
> >         up the entries based on the runtime load address of the kernel.
> >
> > +config KALLSYMS_ALIAS_SRCLINE
> > +     bool "Produces alias for duplicated text symbols" if EXPERT
> > +     depends on KALLSYMS && DEBUG_INFO && !DEBUG_INFO_SPLIT
> > +     help
> > +       It is not uncommon for drivers or modules related to similar
> > +       peripherals to have symbols with the exact same name.
> > +       While this is not a problem for the kernel's binary itself, it
> > +       becomes an issue when attempting to trace or probe specific
> > +       functions using infrastructure like ftrace or kprobe.
> > +
> > +       This option addresses this challenge, producing alias for text
> > +       symbol names that include the file name and line where the symbols
> > +       are defined in the source code.
> > +
> > +config KALLSYMS_ALIAS_SRCLINE_DATA
> > +     bool "Produces alias also for global variables names"
> > +     depends on KALLSYMS_ALIAS_SRCLINE
> > +     help
> > +       Sometimes it can be useful to refer to global vars by name. Since
> > +       they suffer the same issue as text symbols, this config option
> > +       allows having aliases for global variables names too.
> > +
> >  # end of the "standard kernel features (expert users)" menu
> >
> >  # syscall, maps, verifier
> > diff --git a/scripts/kas_alias.py b/scripts/kas_alias.py
> > new file mode 100755
> > index 000000000000..7c6b7045081c
> > --- /dev/null
> > +++ b/scripts/kas_alias.py
> > @@ -0,0 +1,136 @@
> > +#!/usr/bin/env python3
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Copyright (C) 2023 Red Hat, Inc. Alessandro Carminati <alessandro.carminati@xxxxxxxxx>
> > +#
> > +# kas_alias: Adds alias to duplicate symbols in the kallsyms output.
> > +
> > +import subprocess
> > +import sys
> > +import os
> > +import argparse
> > +import re
> > +from collections import namedtuple
> > +
> > +regex_filter = [
> > +        "^__compound_literal\\.[0-9]+$",
> > +        "^__[wm]*key\\.[0-9]+$",
> > +        "^_*TRACE_SYSTEM.*$",
> > +        "^__already_done\\.[0-9]+$",
> > +        "^__msg\\.[0-9]+$",
> > +        "^__func__\\.[0-9]+$",
> > +        "^CSWTCH\\.[0-9]+$",
> > +        "^_rs\\.[0-9]+$",
> > +        "^___tp_str\\.[0-9]+$",
> > +        "^__flags\\.[0-9]+$",
> > +        "^___done\\.[0-9]+$",
> > +        "^__print_once\\.[0-9]+$",
> > +        "^___once_key\\.[0-9]+$",
> > +        "^__pfx_.*$",
> > +        "^__cfi_.*$"
> > +        ]
>
> I saw __pfx_ symbols have suffixes too, this is because this filter is
> only applied when the kas_alias is enabled for data symbols .
>
> > +
> > +class SeparatorType:
> > +    def __call__(self, separator):
> > +        if len(separator) != 1:
> > +            raise argparse.ArgumentTypeError("Separator must be a single character")
> > +        return separator
> > +
> > +Line = namedtuple('Line', ['address', 'type', 'name'])
> > +
> > +def parse_file(filename):
> > +    symbol_list = []
> > +    name_occurrences = {}
> > +
> > +    with open(filename, 'r') as file:
> > +        for line in file:
> > +            fields = line.strip().split()
> > +
> > +            if len(fields) >= 3:
> > +                address, type, name = fields[0], fields[1], ' '.join(fields[2:])
> > +                symbol_list.append(Line(address, type, name))
> > +                name_occurrences[name] = name_occurrences.get(name, 0) + 1
> > +
> > +    return symbol_list, name_occurrences
> > +
> > +def find_duplicate(symbol_list, name_occurrences):
> > +    name_to_lines = {}
> > +    duplicate_lines = []
> > +
> > +    for line in symbol_list:
> > +        if line.name in name_to_lines:
> > +            first_occurrence = name_to_lines[line.name]
> > +            duplicate_lines.extend([first_occurrence, line])
> > +        else:
> > +            name_to_lines[line.name] = line
> > +
> > +    return duplicate_lines
> > +
> > +def start_addr2line_process(binary_file, addr2line_file):
> > +    try:
> > +        addr2line_process = subprocess.Popen([addr2line_file, '-fe', binary_file],
> > +                                             stdin=subprocess.PIPE,
> > +                                             stdout=subprocess.PIPE,
> > +                                             stderr=subprocess.PIPE,
> > +                                             text=True)
> > +        return addr2line_process
> > +    except Exception as e:
> > +        print(f"Error starting addr2line process: {str(e)}")
> > +        sys.exit(1)
> > +
> > +def addr2line_fetch_address(addr2line_process, address):
> > +    try:
> > +        addr2line_process.stdin.write(address + '\n')
> > +        addr2line_process.stdin.flush()
> > +        addr2line_process.stdout.readline().strip()
> > +        output = addr2line_process.stdout.readline().strip()
> > +
> > +        return os.path.normpath(output)
> > +    except Exception as e:
> > +        print(f"Error communicating with addr2line: {str(e)}")
> > +        sys.exit(1)
> > +
> > +def process_line(obj, config):
> > +    if config:
> > +        return not (any(re.match(regex, obj.name) for regex in regex_filter))
> > +    else:
> > +        return obj.type in {"T", "t"}
>
> Here is the problem. The regex_filter check must be done for all text symbols
> if config == False. (BTW, this 'config' is too generic name, 'text_only' will
> be more readable.)

Initially, I had a performance concern about extending the regex check to
all symbols. However, after conducting some testing, I agree that extending
the check to both text and data symbols does indeed result in a negligible
performance penalty.

I'll take your suggestion into consideration and make the necessary changes
to improve readability by renaming 'config' to 'process_data_sym'.
I'll be sure to
incorporate these suggestions in a new version of the code.
Your feedback is invaluable, and I appreciate your patience.

Regards.
Alessandro
>
> Thank you,
>
> > +
> > +if __name__ == "__main__":
> > +    parser = argparse.ArgumentParser(description='Add alias to multiple occurring symbols name in kallsyms')
> > +    parser.add_argument('-a', "--addr2line", dest="addr2line_file", required=True)
> > +    parser.add_argument('-v', "--vmlinux", dest="vmlinux_file", required=True)
> > +    parser.add_argument('-o', "--outfile", dest="output_file", required=True)
> > +    parser.add_argument('-n', "--nmdata", dest="nm_data_file", required=True)
> > +    parser.add_argument('-b', "--basedir", dest="linux_base_dir", required=True)
> > +    parser.add_argument('-s', "--separator", dest="separator", required=False, default="@", type=SeparatorType())
> > +    parser.add_argument('-d', "--data", dest="include_data", required=False, action='store_true')
> > +    config = parser.parse_args()
> > +
> > +    try:
> > +        config.linux_base_dir = os.path.normpath(os.getcwd() + "/" + config.linux_base_dir) + "/"
> > +        symbol_list, name_occurrences = parse_file(config.nm_data_file)
> > +        addr2line_process = start_addr2line_process(config.vmlinux_file, config.addr2line_file)
> > +
> > +        with open(config.output_file, 'w') as file:
> > +            for obj in symbol_list:
> > +                file.write(f"{obj.address} {obj.type} {obj.name}\n")
> > +                if (name_occurrences[obj.name] > 1) and process_line(obj, config.include_data) :
> > +                    output = addr2line_fetch_address(addr2line_process, obj.address)
> > +                    decoration = config.separator + "".join(
> > +                        "_" if not c.isalnum() else c for c in output.replace(config.linux_base_dir, "")
> > +                    )
> > +                    # The addr2line can emit the special string "?:??" when addr2line can not find the
> > +                    # specified address in the DWARF section that after normalization it becomes "____".
> > +                    # In such cases, emitting an alias wouldn't make sense, so it is skipped.
> > +                    if decoration != config.separator + "____":
> > +                        file.write(f"{obj.address} {obj.type} {obj.name + decoration}\n")
> > +
> > +        addr2line_process.stdin.close()
> > +        addr2line_process.stdout.close()
> > +        addr2line_process.stderr.close()
> > +        addr2line_process.wait()
> > +
> > +    except Exception as e:
> > +        print(f"An error occurred: {str(e)}")
> > +        raise SystemExit("Script terminated due to an error")
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index a432b171be82..c110b0f58a19 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -91,7 +91,12 @@ vmlinux_link()
> >
> >       # The kallsyms linking does not need debug symbols included.
> >       if [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then
> > -             ldflags="${ldflags} ${wl}--strip-debug"
> > +             # The kallsyms linking does not need debug symbols included,
> > +             # unless the KALLSYMS_ALIAS_SRCLINE.
> > +             if ! is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE && \
> > +                [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then
> > +                     ldflags="${ldflags} ${wl}--strip-debug"
> > +             fi
> >       fi
> >
> >       if is_enabled CONFIG_VMLINUX_MAP; then
> > @@ -161,7 +166,19 @@ kallsyms()
> >       fi
> >
> >       info KSYMS ${2}
> > -     scripts/kallsyms ${kallsymopt} ${1} > ${2}
> > +     ALIAS=""
> > +     KAS_DATA=""
> > +     if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE_DATA; then
> > +             KAS_DATA="--data"
> > +     fi
> > +     if is_enabled CONFIG_KALLSYMS_ALIAS_SRCLINE; then
> > +             ALIAS=".alias"
> > +             ${srctree}/scripts/kas_alias.py \
> > +                     --addr2line ${ADDR2LINE} --vmlinux ${kallsyms_vmlinux} \
> > +                     --nmdata ${1} --outfile ${1}${ALIAS} \
> > +                     --basedir ${srctree} --separator @ ${KAS_DATA}
> > +     fi
> > +     scripts/kallsyms ${kallsymopt} ${1}${ALIAS} > ${2}
> >  }
> >
> >  # Perform one step in kallsyms generation, including temporary linking of
> > --
> > 2.34.1
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>



[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux