On Fri, May 17, 2024 at 1:31 PM Kris Van Hees <kris.van.hees@xxxxxxxxxx> wrote: > > The offset range data for builtin modules is generated using: > - modules.builtin.modinfo: associates object files with module names > - vmlinux.map: provides load order of sections and offset of first member > per section > - vmlinux.o.map: provides offset of object file content per section > - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME > > The generated data will look like: > > .text 00000000-00000000 = _text > .text 0000baf0-0000cb10 amd_uncore > .text 0009bd10-0009c8e0 iosf_mbi > ... > .text 008e6660-008e9630 snd_soc_wcd_mbhc > .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x > .text 008ea610-008ea780 snd_soc_wcd9335 > ... > .data 00000000-00000000 = _sdata > .data 0000f020-0000f680 amd_uncore > > For each ELF section, it lists the offset of the first symbol. This can > be used to determine the base address of the section at runtime. > > Next, it lists (in strict ascending order) offset ranges in that section > that cover the symbols of one or more builtin modules. Multiple ranges > can apply to a single module, and ranges can be shared between modules. > > Signed-off-by: Kris Van Hees <kris.van.hees@xxxxxxxxxx> > Reviewed-by: Nick Alcock <nick.alcock@xxxxxxxxxx> > --- > Changes since v2: > - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo > - Switched from using modules.builtin.objs to parsing .*.cmd files > - Parse data from .*.cmd in generate_builtin_ranges.awk > --- > scripts/generate_builtin_ranges.awk | 232 ++++++++++++++++++++++++++++ > 1 file changed, 232 insertions(+) > create mode 100755 scripts/generate_builtin_ranges.awk > > diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk > new file mode 100755 > index 0000000000000..6975a9c7266d9 > --- /dev/null > +++ b/scripts/generate_builtin_ranges.awk > @@ -0,0 +1,232 @@ > +#!/usr/bin/gawk -f > +# SPDX-License-Identifier: GPL-2.0 > +# generate_builtin_ranges.awk: Generate address range data for builtin modules > +# Written by Kris Van Hees <kris.van.hees@xxxxxxxxxx> > +# > +# Usage: generate_builtin_ranges.awk modules.builtin.modinfo vmlinux.map \ > +# vmlinux.o.map > modules.builtin.ranges > +# > + > +BEGIN { > + # modules.builtin.modinfo uses \0 as record separator > + # All other files use \n. > + RS = "[\n\0]"; > +} Why do you want to parse modules.builtin.modinfo instead of modules.builtin? modules.builtin uses \n separator. > + > +# Return the module name(s) (if any) associated with the given object. > +# > +# If we have seen this object before, return information from the cache. > +# Otherwise, retrieve it from the corresponding .cmd file. > +# > +function get_module_info(fn, mod, obj, mfn, s) { There are 5 arguments, while the caller passes only 1 argument ( get_module_info($4) ) > + if (fn in omod) > + return omod[fn]; > + > + if (match(fn, /\/[^/]+$/) == 0) > + return ""; > + > + obj = fn; > + mod = ""; > + mfn = ""; > + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd"; > + if (getline s <fn == 1) { > + if (match(s, /DKBUILD_MODNAME=[^ ]+/) > 0) { > + mod = substr(s, RSTART + 17, RLENGTH - 17); > + gsub(/['"]/, "", mod); > + gsub(/:/, " ", mod); > + } > + > + if (match(s, /DKBUILD_MODFILE=[^ ]+/) > 0) { > + mfn = substr(s, RSTART + 17, RLENGTH - 17); > + gsub(/['"]/, "", mfn); > + gsub(/:/, " ", mfn); > + } > + } > + close(fn); > + > +# tmp = $0; > +# $0 = s; > +# print mod " " mfn " " obj " " $NF; > +# $0 = tmp; > + > + # A single module (common case) also reflects objects that are not part > + # of a module. Some of those objects have names that are also a module > + # name (e.g. core). We check the associated module file name, and if > + # they do not match, the object is not part of a module. You do not need to use KBUILD_MODNAME. Just use KBUILD_MODFILE only. If the same path is found in modules.builtin, it is a built-in module. Its basename is modname. One more question in a corner case. How does your code work when an object is shared by multiple modules? For example, set CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=y How is the address range of drivers/edac/skx_common.o handled? There are 4 possibilities. - included in skx_edac - included in i10nm_edac - included in both of them - not included in any of them The correct behavior should be "included in both of them". How does your code work? > + if (mod !~ / /) { > + if (!(mod in mods)) > + return ""; > + if (mods[mod] != mfn) > + return ""; > + } > + > + # At this point, mod is a single (valid) module name, or a list of > + # module names (that do not need validation). > + omod[obj] = mod; > + close(fn); > + > + return mod; > +} > + > +FNR == 1 { > + FC++; > +} > + > +# (1-old) Build a mapping to associate object files with built-in module names. > +# > +# The first file argument is used as input (modules.builtin.objs). > +# > +FC == 1 && old_behaviour { > + sub(/:/, ""); > + mod = $1; > + sub(/([^/]*\/)+/, "", mod); > + sub(/\.o$/, "", mod); > + gsub(/-/, "_", mod); > + > + if (NF > 1) { > + for (i = 2; i <= NF; i++) { > + if ($i in mods) > + mods[$i] = mods[$i] " " mod; > + else > + mods[$i] = mod; > + } > + } else > + mods[$1] = mod; > + > + next; > +} Please remove the old code. > +# (1) Build a lookup map of built-in module names. > +# > +# The first file argument is used as input (modules.builtin.modinfo). > +# > +# We are interested in lines that follow the format > +# <modname>.file=<path> > +# and use them to record <modname> > +# > +FC == 1 && /^[^\.]+.file=/ { > + gsub(/[\.=]/, " "); > +# print $1 " -> " $3; > + mods[$1] = $3; > + next; > +} I guess parsing module.builtin will be simpler. > + > +# (2) Determine the load address for each section. > +# > +# The second file argument is used as input (vmlinux.map). > +# > +# Since some AWK implementations cannot handle large integers, we strip of the > +# first 4 hex digits from the address. This is safe because the kernel space > +# is not large enough for addresses to extend into those digits. > +# > +FC == 2 && /^\./ && NF > 2 { > + if (type) > + delete sect_addend[type]; > + > + if ($1 ~ /percpu/) > + next; > + > + raw_addr = $2; > + addr_prefix = "^" substr($2, 1, 6); > + sub(addr_prefix, "0x", $2); > + base = strtonum($2); > + type = $1; > + anchor = 0; > + sect_base[type] = base; > + > + next; > +} > + > +!type { > + next; > +} > + > +# (3) We need to determine the base address of the section so that ranges can > +# be expressed based on offsets from the base address. This accommodates the > +# kernel sections getting loaded at different addresses than what is recorded > +# in vmlinux.map. > +# > +# At runtime, we will need to determine the base address of each section we are > +# interested in. We do that by recording the offset of the first symbol in the > +# section. Once we know the address of this symbol in the running kernel, we > +# can calculate the base address of the section. > +# > +# If possible, we use an explicit anchor symbol (sym = .) listed at the base > +# address (offset 0). > +# > +# If there is no such symbol, we record the first symbol in the section along > +# with its offset. > +# > +# We also determine the offset of the first member in the section in case the > +# final linking inserts some content between the start of the section and the > +# first member. I.e. in that case, vmlinux.map will list the first member at > +# a non-zero offset whereas vmlinux.o.map will list it at offset 0. We record > +# the addend so we can apply it when processing vmlinux.o.map (next). > +# > +FC == 2 && !anchor && raw_addr == $1 && $3 == "=" && $4 == "." { > + anchor = sprintf("%s %08x-%08x = %s", type, 0, 0, $2); > + sect_anchor[type] = anchor; > + > + next; > +} > + > +FC == 2 && !anchor && $1 ~ /^0x/ && $2 !~ /^0x/ && NF <= 4 { > + sub(addr_prefix, "0x", $1); > + addr = strtonum($1) - base; > + anchor = sprintf("%s %08x-%08x = %s", type, addr, addr, $2); > + sect_anchor[type] = anchor; > + > + next; > +} > + > +FC == 2 && base && /^ \./ && $1 == type && NF == 4 { > + sub(addr_prefix, "0x", $2); > + addr = strtonum($2); > + sect_addend[type] = addr - base; > + > + if (anchor) { > + base = 0; > + type = 0; > + } > + > + next; > +} > + > +# (4) Collect offset ranges (relative to the section base address) for built-in > +# modules. > +# > +FC == 3 && /^ \./ && NF == 4 && $3 != "0x0" { > + type = $1; > + if (!(type in sect_addend)) > + next; This assumes sections are 1:1 mapping between vmlinux.o and vmlinux. How far does this assumption work? CONFIG_LD_DEAD_CODE_DATA_ELIMINATION will not work at least. As I said in the previous review, gawk is not documented in Documentation/process/changes.rst Please add it if you go with gawk. > + > + sub(addr_prefix, "0x", $2); > + addr = strtonum($2) + sect_addend[type]; > + > + mod = get_module_info($4); > +# printf "[%s, %08x] %s [%s] %08x\n", mod_name, mod_start, $4, mod, addr; > + if (mod == mod_name) > + next; > + > + if (mod_name) { > + idx = mod_start + sect_base[type] + sect_addend[type]; > + entries[idx] = sprintf("%s %08x-%08x %s", type, mod_start, addr, mod_name); > + count[type]++; > + } > +# if (mod == "") > +# printf "ENTRY WITHOUT MOD - MODULE MAY END AT %08x\n", addr > + > + mod_name = mod; > + mod_start = addr; > +} > + > +END { > + for (type in count) { > + if (type in sect_anchor) > + entries[sect_base[type]] = sect_anchor[type]; > + } > + > + n = asorti(entries, indices); > + for (i = 1; i <= n; i++) > + print entries[indices[i]]; > +} > -- > 2.43.0 > -- Best Regards Masahiro Yamada