Re: BTF compatibility issue across builds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
Hi,

We recently run into module load failure related to split BTF on openSUSE
Tumbleweed[1], which I believe is something that may also happen on other
rolling distros.

The error looks like the follow (though failure is not limited to ipheth)

     BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:

     failed to validate module [ipheth] BTF: -22

The error comes down to trying to load BTF of *kernel modules from a
different build* than the runtime kernel (but the source is the same), where
the base BTF of the two build is different.

While it may be too far stretched to call this a bug, solving this might
make BTF adoption easier. I'd natively think that we could further split
base BTF into two part to avoid this issue, where .BTF only contain exported
types, and the other (still residing in vmlinux) holds the unexported types.

What is the exported types? The types used by export symbols?
This for sure will increase btf handling complexity.


Does that sound like something reasonable to work on?


## Root case (in case anyone is interested in a verbose version)

On openSUSE Tumbleweed there can be several builds of the same source. Since
the source is the same, the binaries are simply replaced when a package with
a larger build number is installed during upgrade.

In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
are previously missing in base BTF of 5.15.12-1.1.

As stated in [2] below, I think we should understand why rebuild is triggered. If the rebuild for vmlinux is triggered, why the modules cannot be rebuild at the same time?


The addition of entries in BTF type and string table caused extra offset of
type IDs and string position in the base BTF, and as such the same type ID
may refers to a totally different type, and as does name_off of types.

When users on build#1 (ie 5.15.12-1.1) installs build#3 (ie 5.15.12-1.3),
and then tries to load kernel module, they will be loading build#3 module on
build#1 kernel; and with base BTF of the two builds different, name_off of
some types will end up pointing at invalid string, and the kernel bails out.


Best,
Shung-Hsi Yu

1: https://bugzilla.opensuse.org/show_bug.cgi?id=1194501
2: my guess is rebuild is trigger due to compiler toolchain update, but I
    wasn't able to pin down exactly what changed




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux