On 6/17/24 19:58, Sami Tolvanen wrote: > Hi folks, > > This series implements CONFIG_MODVERSIONS for Rust, an important > feature for distributions like Android that want to ship Rust > kernel modules, and depend on modversions to help ensure module ABI > compatibility. Thanks for working on this. Below is some feedback with my (open)SUSE hat on, although it should be quite general. > There have been earlier proposals [1][2] that would allow Rust > modules to coexist with modversions, but none that actually implement > symbol versioning. Unlike C, Rust source code doesn't have sufficient > information about the final ABI, as the compiler has considerable > freedom in adjusting structure layout for improved performance [3], > for example, which makes using a source code parser like genksyms > a non-starter. Based on Matt's suggestion and previous feedback > from maintainers, this series uses DWARF debugging information for > computing versions. DWARF is an established and relatively stable > format, which includes all the necessary ABI details, and adding a > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a > reasonable trade-off. Using the DWARF data makes sense to me. Distribution kernels are normally built with debuginfo because one has to be able to debug them later, unsurprisingly. Besides that, one now typically wants to use BPF together with built-in BTF data (CONFIG_DEBUG_INFO_BTF), which also requires building the kernel with debuginfo. I would however keep in mind that producing all DWARF data has some cost. From a quick test, an x86_64-defconfig build is ~30% slower for me with CONFIG_DEBUG_INFO=y. The current genksyms tool allows to have debuginfo disabled when backporting some patches and consequently have a quicker feedback whether modversions changed. This option would disappear with gendwarfksyms but I think it is acceptable. > > The first 12 patches of this series add a small tool for computing > symbol versions from DWARF, called gendwarfksyms. When passed a list > of exported symbols, the tool generates an expanded type string > for each symbol, and computes symbol CRCs similarly to genksyms. > gendwarfksyms is written in C and uses libdw to process DWARF, mainly > because of the existing support for C host tools that use elfutils > (e.g., objtool). In addition to calculating CRCs of exported symbols, genksyms has other features which I think are important. Firstly, the genksyms tool has a human-readable storage format for input data used in the calculation of symbol CRCs. Setting the make variable KBUILD_SYMTYPES enables dumping this data and storing it in *.symtypes files. When a developer later modifies the kernel and wants to check if some symbols have changed, they can take these files and feed them as *.symref back to genksyms. This allows the tool to provide an actual reason why some symbols have changed, instead of just printing that their CRCs are different. Is there any plan to add the same functionality to gendwarfksyms, or do you envison that people will use libabigail, Symbol-Type Graph, or another tool for making this type of comparison? Secondly, when distributions want to maintain stable kABI, they need to be able to deal with patch backports that add new members to structures. One common approach is to have placeholders in important structures which can be later replaced by the new members as needed. __GENKSYMS__ ifdefs are then used at the C source level to hide these kABI-compatible changes from genksyms. Gendwarfksyms works on the resulting binary and so using such ifdefs wouldn't work. Instead, I suspect that what is required is a mechanism to tell the tool that a given change is ok, probably by allowing to specify some map from the original definition to the new one. Is there a plan to implement something like this, or how could it be addressed? > Another compatibility issue is fitting the extremely long mangled > Rust symbol names into struct modversion_info, which can only hold > 55-character names (on 64-bit systems). Previous proposals suggested > changing the structure to support longer names, but the conclusion was > that we cannot break userspace tools that parse the version table. > > The next two patches of the series implement support for hashed > symbol names in struct modversion_info, where names longer than 55 > characters are hashed, and the hash is stored in the name field. To > avoid breaking userspace tools, the binary hash is prefixed with a > null-terminated string containing the name of the hash function. While > userspace tools can later be updated to potentially produce more > useful information about the long symbols, this allows them to > continue working in the meantime. I think this approach with hashed names is quite complex. I'd personally be also in favor of having a new section with variable-length strings to store the names of imported symbols. As yet another alternative, it should be also possible to refer directly into .symtab/.strtab to avoid duplicating the names, but I suspect it would be non-trivial to implement. Cheers, Petr