On Tue, Jun 18, 2024 at 11:57:38AM -0700, Luis Chamberlain wrote: > On Fri, Jun 14, 2024 at 01:14:27PM -0400, Kris Van Hees wrote: > > The offset range data for builtin modules is generated using: > > - modules.builtin: associates object files with module names > > - vmlinux.map: provides load order of sections and offset of first member > > per section > > - vmlinux.o.map: provides offset of object file content per section > > - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME > > What tests do we have to ensure this is working correctly and not > spewing out lies? What proactive mechanisms do we have to verify the > semantics won't change, or to warn at build time that this awk script > will break upon new changes? Is this just best effort? Is that good > enough? Why? I posted a new patch series [0] that hopefully addresses your questions. Most specifically, I included a patch with a verifier script that validates the generated data. It is available for use but is not automatically executed because the modules.builtin.ranges data is not required for proper kernel operation. After all, the generated data is there for tools to use and is not critical to the kernel itself. While there is always the possibility of something breaking in this generation due to future kernel changes, I'd say that this same issue applies to pretty much everything in the build process of the kernel. Some changes will always require other steps to be updated - I'll be happy to maintain this contribution to help ensure changes are addressed. The generated data depends on 2 main things for its correctness: the data that is found in the linker maps, and the logic of the script parsing that data. The logic (documented in the commit message and more in detail in the actual script [1]) is pretty straight-forward because it is all based on a linear walk of the content of vmlinux (using vmlinux.map), collecting the start and end offsets of each object (CU) and agrgegating this information based on the built-in module(s) the object CU) belongs to (if any). For the case where vmlinux was linked using vmlinux.o, the script uses vmlinux.o.map data to get the actual content of included sections. The documented limitation is (of course) that if no data is available to associate addresses (or offsets) in vmlinux with the source objects (CUs), it is not possible to generate modules.builtin.ranges data. That is reflected by making the config option to have this data generated conflict with using LTO_CLANG_FULL or LTO_CLANG_THIN. But again, given that the generated data does not directly impact the operation of the kernel, the impact of possible breakage is minimal. And like any other kernel feature, it will have to be maintained which I will happily do to ensure this works and keeps working. Kris [0] https://lore.kernel.org/lkml/20240716031045.1781332-1-kris.van.hees@xxxxxxxxxx/ [1] https://lore.kernel.org/lkml/20240716031045.1781332-3-kris.van.hees@xxxxxxxxxx/