Re: [PATCH 00/15] Implement MODVERSIONS for Rust

Petr Pavlu <petr.pavlu@xxxxxxxx> · Wed, 10 Jul 2024 09:30:49 +0200

On 6/17/24 19:58, Sami Tolvanen wrote:
> Hi folks,
> 
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.

Thanks for working on this. Below is some feedback with my (open)SUSE
hat on, although it should be quite general.

> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.

Using the DWARF data makes sense to me. Distribution kernels are
normally built with debuginfo because one has to be able to debug them
later, unsurprisingly. Besides that, one now typically wants to use BPF
together with built-in BTF data (CONFIG_DEBUG_INFO_BTF), which also
requires building the kernel with debuginfo.

I would however keep in mind that producing all DWARF data has some
cost. From a quick test, an x86_64-defconfig build is ~30% slower for me
with CONFIG_DEBUG_INFO=y. The current genksyms tool allows to have
debuginfo disabled when backporting some patches and consequently have
a quicker feedback whether modversions changed. This option would
disappear with gendwarfksyms but I think it is acceptable.

> 
> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.
> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).

In addition to calculating CRCs of exported symbols, genksyms has other
features which I think are important.

Firstly, the genksyms tool has a human-readable storage format for input
data used in the calculation of symbol CRCs. Setting the make variable
KBUILD_SYMTYPES enables dumping this data and storing it in *.symtypes
files.

When a developer later modifies the kernel and wants to check if some
symbols have changed, they can take these files and feed them as
*.symref back to genksyms. This allows the tool to provide an actual
reason why some symbols have changed, instead of just printing that
their CRCs are different.

Is there any plan to add the same functionality to gendwarfksyms, or do
you envison that people will use libabigail, Symbol-Type Graph, or
another tool for making this type of comparison?

Secondly, when distributions want to maintain stable kABI, they need to
be able to deal with patch backports that add new members to structures.
One common approach is to have placeholders in important structures
which can be later replaced by the new members as needed. __GENKSYMS__
ifdefs are then used at the C source level to hide these kABI-compatible
changes from genksyms.

Gendwarfksyms works on the resulting binary and so using such ifdefs
wouldn't work. Instead, I suspect that what is required is a mechanism
to tell the tool that a given change is ok, probably by allowing to
specify some map from the original definition to the new one.

Is there a plan to implement something like this, or how could it be
addressed?

> Another compatibility issue is fitting the extremely long mangled
> Rust symbol names into struct modversion_info, which can only hold
> 55-character names (on 64-bit systems). Previous proposals suggested
> changing the structure to support longer names, but the conclusion was
> that we cannot break userspace tools that parse the version table.
> 
> The next two patches of the series implement support for hashed
> symbol names in struct modversion_info, where names longer than 55
> characters are hashed, and the hash is stored in the name field. To
> avoid breaking userspace tools, the binary hash is prefixed with a
> null-terminated string containing the name of the hash function. While
> userspace tools can later be updated to potentially produce more
> useful information about the long symbols, this allows them to
> continue working in the meantime.

I think this approach with hashed names is quite complex. I'd personally
be also in favor of having a new section with variable-length strings to
store the names of imported symbols. As yet another alternative, it
should be also possible to refer directly into .symtab/.strtab to avoid
duplicating the names, but I suspect it would be non-trivial to
implement.

Cheers,
Petr