Re: F36 Change: Package information on ELF objects (System-Wide Change proposal)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 25, 2021 at 07:26:47PM +0000, Zbigniew Jędrzejewski-Szmek wrote:
On Mon, Oct 25, 2021 at 03:09:00PM -0400, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/Package_information_on_ELF_objects

== Summary ==
All binaries (executables and shared libraries) are annotated with an
ELF note that identifies the rpm for which this file was built. This
allows binaries to be identified when they are distributed without any
of the rpm metadata. `systemd-coredump` uses this to log package
versions when reporting crashes.

This is a resubmission of the proposal for F35 which was (narrowly)
rejected at the time. We added copious descriptions of motivations
for the change, and analysis of impact on upgrades, and more links
to documentation.

Zbyszek

== Owner ==
* Name: [[User:Zbyszek|Zbigniew Jędrzejewski-Szmek]]
* Email: zbyszek@xxxxxxxxx
* Name: Lennart Poettering
* Email: mzsrqben@xxxxxxxxxxxx


== Detailed Description ==
People mix binaries (programs and libraries) from different
distributions (for example using Fedora containers on Debian or vice
versa), and distribute binaries without packaging metadata (for
example by stripping everything except the binary from a container
image, also removing `/usr/lib/.build-id/*`), compile their own rpm
packages (for internal distribution and installation), and compile and
distribute their own binaries. Sometimes we need to introspect a
binary and figure out its provenance, for example when a program
crashes and we are looking at a core dump, but also when we have a
binary without the packaging metadata. When the need to introspect a
binary arises, we have some very good mechanisms to show the
provenance: when a file is installed through the package manager we
can directly list the providing package, but even without this we can
use build-ids embedded in the binary to uniquely identify the
originating build. But those mechanisms work best when we're in the
realm of a single distribution. In particular, build-ids can be easily
tied to a source rpm, but only when we have the source rpm is part of
the distribution and the build-id was registered in the appropriate
database which maps build-ids to real package names. When we move
outside of the realm of a single distribution, it can be hard to
figure out where a given binary originates from. If we know that a
binary is from a given distribution, we may be able to use some
distro-specific mechanism to figure out this information. But those
mechanisms will be different for different distributions and will
often require network access. With this change we aim to provide a
mechanism that is is very simple, provides a "human-readable" origin
information without further processing, is portable across distros,
and works without network access.

The directly motivating use case is display of core dumps. Right now
we have build-ids, but those are just opaque hexadecimal numbers that
are not meaningful to users. We would like to immediately list
versions of packages involved in the crash (including both the program
and any libraries it links to). It is not enough to query the rpm
database to do the equivalent of `rpm -qf …`: very often programs
crash after some packages have been upgraded and the binaries loaded
into memory are not the binaries that are currently present on disk,
or when through some mishap, the binaries on disk do not match the
installed rpms.  A mechanism that works without rpm database lookup or
network access allows this information to be showed immediately in
`coredumpctl` listings and journal entries about the crash. This
includes crashes that happen in the initrd and sandboxed containers.

A second motivating use case is when users distribute their own
binaries and would like to collect crash information. Build-ids are a
solution that is technically possible, but easy to get wrong in
practice: users would need to immediately record the build-id after
the build and store the mapping to program names, versions, and build
number in some database. It's much easier to be able to record
something during the build in the build product itself.

A third motivating use case is the general mixing of Fedora binaries
with programs and libraries from different distributions, both with
our binaries being used as the base for foreign binaries, and the
other way around. Whilst most distributions provide some mechanism to
figure out the source build information, those mechanisms vary by
distribution and may not be easy to access from a "foreign" system.
Such mixing is expected with containers, flatpaks, snaps, Python
binary wheels, anaconda packages, and quite often when somebody
compiles a binary and puts it up on the web for other people to
download.

We propose a new mechanism which is designed to be very simple but
extensible: a small JSON document is embedded in an section in the ELF
binary. This document can be easily read by a human if necessary, but
it is also well-defined and can be processed programatically. For
example, `systemd-coredump` will immediately make use of this to
display package ''nevra'' information for crashes. The format is also
easy to generate, so it can be added to any build system, either using
the helpers that we provide or even reimplemented from scratch.

For the case where we mix binaries from different distros (the third
motivating use case above), this approach is the most useful when this
system is used by all distros and even non-distro builds. The more
widely it is used, the more useful it becomes. The specification was
developed in collaboration with Debian developers, and we hope that
Fedora and Debian will lead the way for this to become as widely used
as build-ids. But even if the information is only available from some
distros, it is still useful, except that fallback mechanisms need to
be implemented.

=== Existing system: `.note.gnu.build-id` ===

We already have build-ids: every ELF object has a `.note.gnu.build-id`
note, and given a core file, we can read the build-id and look it up
in the rpm database (`dnf repoquery --whatprovides debuginfo(build-id)
= …`) to map it to a package name.
Build-ids are unique and compact and very generic and work as expected
in general. But they have some downsides:
* build-ids are not very informative for users. Before the build-id is
converted back to the appropriate package, it's completely opaque.
* build-ids require a working rpm database or an internet connection
to map to the package name.

Three important cases:
* minimal containers: the rpm database is not installed in the
containers. The information about build-ids needs to be stored
externally, so package name information is not available immediately,
but only after offline processing. The new note doesn't depend on the
rpm db in any way.
* handling of a core from a container, where the container and host
have different distros
* self-built and external packages: unless a lot of care is taken to
keep access to the debuginfo packages, this information may be lost.
The new note is available even if the repository metadata gets lost.
Users can easily provide equivalent information in a format that makes
sense in their own environment. It should work even when rpms and debs
and other formats are mixed, e.g. during container image creation.

=== New system: `.note.package` ===

The new note is created and propagated similarly to
`.note.gnu.build-id`. The difference is that we inject the information
about package ''nevra'' from the build system.

The implementation is very simple: `%{build_ldflags}` are extended
with a command to insert a custom note as a separate section in an ELF
object. See [https://github.com/systemd/package-notes/blob/main/hello.spec
hello.spec] for an example. This is done in the default macros, so all
packages that use the prescribed link flags will be affected.

The note is a compact json string. This allows the format to be
trivially extensible (new fields can be added at will), easy to
process (json is extremely popular and parsers are widely available).
Using a single field rather than a set of separated notes is more
space-efficient. With multiple fields the padding and alignment
requirements cause unnecessary overhead.

The system was designed with cross-distro collaboration and is
flexible enough to identify binaries from different packaging formats
and build systems (rpms, debs, custom binaries).

See https://systemd.io/COREDUMP_PACKAGE_METADATA/ for detailed
description of the format.

One of the advantages of using an ELF note, as opposed to say a series
of extended attributes on the binary itself, is that the ELF note gets
automatically captured and copied into a core file by the kernel.
Extended attributes would have to be copied manually, which might not
even be possible because the binary on disk may have been removed by
the time the crash is analyzed.

The overhead is about 200 bytes for each ELF object.
We have about overall 33200 files in `/usr/s?bin/` and about 36600
`.so` files (F35, single architecture,
results from `dnf repoquery -l 2>/dev/null | rg '^/usr/s?bin/' | sort
-u | wc -l`,
`dnf repoquery -l 2>/dev/null | rg '^/usr/lib64/.*\.so$' |sort -u|wc -l`).
If we do this for the whole distro, we get 69800 × 200 = 13 MB.
For a typical installation, we can expect about 300–400 kB.
Thus the overhead of additionally used space is neglible (also see the
Feedback section for more discussion).

Precise measurements TBD once this is turned on and we have real
measurements for a larger number of builds.

=== Examples ===
<pre>
$ objdump -s -j .note.package build/libhello.so

build/libhello.so:     file format elf64-x86-64

Contents of section .note.package:
 02ec 04000000 63000000 7e1afeca 46444f00  ....c...~...FDO.
 02fc 7b227479 7065223a 2272706d 222c226e  {"type":"rpm","n
 030c 616d6522 3a226865 6c6c6f22 2c227665  ame":"hello","ve
 031c 7273696f 6e223a22 302d312e 66633335  rsion":"0-1.fc35
 032c 2e783836 5f363422 2c226f73 43706522  .x86_64","osCpe"
 033c 3a226370 653a2f6f 3a666564 6f726170  :"cpe:/o:fedorap
 034c 726f6a65 63743a66 65646f72 613a3333  roject:fedora:33
 035c 227d0000                             "}..
</pre>

<pre>
$ readelf --notes build/hello | grep "description data" | sed -e
"s/\s*description data: //g" -e "s/ //g" | xxd -p -r | jq
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10de
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10af
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x119f
{
  "type": "rpm",
  "name": "hello",
  "version": "0-1.fc35.x86_64",
  "osCpe": "cpe:/o:fedoraproject:fedora:33"
}
</pre>

<pre>
$ coredumpctl info
           PID: 44522 (fsverity)
...
       Package: fsverity-utils/1.3-1
      build-id: ac89bf7175b04d7eec7f6544a923f45be111f0be
       Message: Process 44522 (fsverity) of user 1000 dumped core.

                Found module
/home/bluca/git/fsverity-utils/libfsverity.so.0 with build-id:
fa40fdfb79aea84167c98ca8a89add9ac4f51069
                Metadata for module
/home/bluca/git/fsverity-utils/libfsverity.so.0 owned by FDO found: {
                "packageType" : "deb",
                "package" : "fsverity-utils",
                "packageVersion" : "1.3-1"
                }

                Found module linux-vdso.so.1 with build-id:
aba08e06103f725e26f1d7c178fb6b76a564a35d
                Found module libpthread.so.0 with build-id:
e91114987a0147bd050addbd591eb8994b29f4b3
                Found module libdl.so.2 with build-id:
d3583c742dd47aaa860c5ae0c0c5bdbcd2d54f61
                Found module ld-linux-x86-64.so.2 with build-id:
f25dfd7b95be4ba386fd71080accae8c0732b711
                Found module libcrypto.so.1.1 with build-id:
749142d5ee728a76e7cdc61fd79d2311a77405a2
                Found module libc.so.6 with build-id:
18b9a9a8c523e5cfe5b5d946d605d09242f09798
                Found module fsverity with build-id:
ac89bf7175b04d7eec7f6544a923f45be111f0be
                Metadata for module fsverity owned by FDO found: {
                "packageType" : "deb",
                "package" : "fsverity-utils",
                "packageVersion" : "1.3-1"
                }

                Stack trace of thread 44522:
                #0  0x00007fe7c8af26f4 __GI___nanosleep (libc.so.6 + 0xc66f4)
                #1  0x00007fe7c8af262a __sleep (libc.so.6 + 0xc662a)
                #2  0x00005608481407dd main (fsverity + 0x27dd)
                #3  0x00007fe7c8a5009b __libc_start_main (libc.so.6 + 0x2409b)
                #4  0x000056084814094a _start (fsverity + 0x294a)
</pre>

== Feedback ==
See [https://github.com/systemd/systemd/issues/18433 systemd issue
#18433] for upstream discussion and implementation proposals.

=== Concerns about additional changes to files ===

<pre>
17:32:30 <Eighth_Doctor> I think zbyszek underestimates how much of a
problem it is to stamp every ELF binary with ''nevra'' data
17:32:44 <mhroncok> zbyszek: so, assuming python has ~100 ELF .so
files and I change one text file
17:33:22 <mhroncok> (ignore for the time being that the .so files
often changed because of toolchain updates and assume they are stable)
</pre>

I tested this with python3.10. So far there are 13 builds of that
package in F35:
`python3.10-3.10.0-1.fc35`,
`python3.10-3.10.0~a6-1.fc35`,
`python3.10-3.10.0~a6-2.fc35`,
`python3.10-3.10.0~a7-1.fc35`,
`python3.10-3.10.0~b1-1.fc35`,
`python3.10-3.10.0~b2-2.fc35`,
`python3.10-3.10.0~b2-3.fc35`,
`python3.10-3.10.0~b3-1.fc35`,
`python3.10-3.10.0~b4-1.fc35`,
`python3.10-3.10.0~b4-2.fc35`,
`python3.10-3.10.0~b4-3.fc35`,
`python3.10-3.10.0~rc1-1.fc35`,
`python3.10-3.10.0~rc2-1.fc35`.
I extracted the builds (for `.x86_64`) and made a list of all `.so`
files (1368 files), and calculated sha256 hashes for them. No two
files repeat, there are 1368 distinct hashes. So the files are
'''already''' different between builds and the additional proposed
metadata does will not make a significant difference.

Note that this range of Python versions encompasses periods when the
package is under development and undergoes significant changes (alpha
versions), and when it's only undergoing small changes (rc versions).

The fact that we get different files in each build is not surprising,
because files embed build-ids which differ between builds. But even if
we ignore those, binaries generally differ between builds. Even sizes
tend to vary between builds: there are 636 distinct `.so` file sizes,
i.e. on average any given size only repeats twice (presumably most
often for the same file). Running `diffoscope` on `.so` files from
different builds shows minor changes in the assembly which I did not
analyze futher.

If people have specific questions, for example about overhead in some
scenario, I'd be happy to answer them. Until now, the issues that were
raised were very vague, so it's impossible to answer them.

=== Why not just use the rpm database? ===

<pre>
17:34:33 <dcantrell> The main reason for this appears to be that we
need the RPM db locally to resolve build-ids to package names. But
since containers wipe /var/lib/rpm, we can't do that. So the solution
is to put the ''nevra'' in ELF metadata?
17:34:39 <dcantrell> That feels like the wrong approach.
</pre>

First, there are legitimate reasons to strip packaging metadata from
images. For example, for an initrd image from rpms, I get 117 MB of
files (without compression), and out of this `/var/lib/rpm` is 5.9 MB,
and `/var/lib/dnf` is 4.2 MB. This is an overhead of 9%. This is ''not
much'', but still too much to keep in the image unless necessary.
Similar ratios will happen for containers of similar size. Reducing
image size by one tenth is important. There is no `rpm` or `dnf` in
the image, to the package database is not even usable without external
tools.

As discussed on IRC
(https://meetbot.fedoraproject.org/teams/fesco/fesco.2021-05-11-17.01.log.html),
the containers ''we'' build don't wipe this metadata, but custom
Dockerfiles do that.

Second, as described in Description section above, not everybody and
everything uses rpm. The Fedora motto is "we make an operating system
and we make it easy for you to do useful stuff with it" (and yes, this
is an actual quote from the official docs), and this stuff involves
reusing our binaries in containers and custom installations and
whatnot, not just straightforward installations with `dnf`. And in the
other direction, people will build their own binaries that are not
packaged as rpms. But it is still important to be able to figure out
the exact version of a binary, especially after it crashes.

=== Why do this in Fedora? ===

<pre>
17:36:49 <mhroncok> I don't understand how non-rpm distros and custom
built binaries are affected by our rpm-build environment :/
</pre>

The idea is that we inject this into our build system, and Debian
injects this into their build system, and so on… As mentioned, this is
a cross-distro effort. Also, people can use it in their custom build
systems if they build and distribute binaries internally. The scheme
would obviously be most useful if used comprehensively, but it's still
useful when available partially. We hope that Fedora can lead the way.
(This is similar to build-ids: when initially adopted, they were used
only by some distros, but were useful even then. Nowadays, with
comprehensive adoption, they are even more useful.)

https://hpc.guix.info/blog/2021/09/whats-in-a-package/ contains a nice
description of a pathological case of packaging hacks and binary
redistribution. When trying to unravel something like this,
information embedded directly in the binaries would be quite useful.


== Benefit to Fedora ==
A simple and reliable way to gather information about package versions
of programs is added.
It enhances, instead of replacing, the existing mechanisms.
It is particularly useful when reporting crash dumps, but can also be
used for image introspection and forensincs, license checks and
version scans on containers, etc.

If we adopt this in Fedora, Fedora leads the way on implementing the
standard. Fedora binaries used in any context can be easily
recognized. Fedora binaries provide a better basis to build things.

If other distros adopt this, we can introspect and report on those
binaries easily within the Fedora context. For example, when somebody
is using a container with some programs that originate in the Debian
ecosystem, we would be able to identify those programs without tools
like `apt` or `dpkg-query`. Core dump analaysis executed in the Fedora
host can easily provide useful information about programs from foreign
builds.

== Implementation in Other Distributions ==
=== Microsoft CBL-Mariner ===
[https://en.wikipedia.org/wiki/CBL-Mariner CBL-Mariner] is an
[https://github.com/microsoft/CBL-Mariner open source] Linux
distribution created by Microsoft, targeted at first-party and
container workloads on Azure. It is used both as a container runner
host and a base container image.
Mariner adopted the ELF stamping packaging metadata spec in
[https://github.com/microsoft/CBL-Mariner/blob/1.0/SPECS/mariner-rpm-macros/gen-ld-script.sh
version 1.0], initially to add OS metadata, and package-level metadata
will be added in a following release.
=== Debian ===
A package-level proof-of-concept is included in the
[https://github.com/systemd/package-notes/blob/main/dh_package_notes
package-notes] repository.
A [https://salsa.debian.org/bluca/debhelper/-/tree/notes_metadata
system-level proof-of-concept] that enables ELF stamping by default in
all builds implicitly will be proposed for adoption in the future.

== Scope ==
* Proposal owners:
** create a specification (First version DONE:
[https://systemd.io/COREDUMP_PACKAGE_METADATA
COREDUMP_PACKAGE_METADATA]. We might need to make some adjustments
based on the deployment in Fedora, but no big changes are expected.)
** write a script to generate the package note (First version DONE:
[https://github.com/systemd/package-notes/blob/main/generate-package-notes.py
generate-package-notes.py])
** provide a patch for `redhat-rpm-config` to insert appropriate
compilation options
** extend systemd's coredumpctl to extract and display this
information (DONE: [https://github.com/systemd/systemd/pull/19135 PR
#19135], available in systemd-249)
** submit pull request to Packaging Guidelines

* Other developers:
** possibly add support in abrt?

* Release engineering: There should be no impact.

* Policies and guidelines:
The new flags should be mentioned in Packaging Guidelines.

* Trademark approval: N/A (not needed for this Change)
N/A

* Alignment with Objectives:
It might be relevant for Minimization. Even though it increases the
image size a tiny bit, it makes minimized images work a bit better.

== Upgrade/compatibility impact ==
No impact.

== How To Test ==
<pre>
$ bash -c 'kill -SEGV $$'
$ coredumpctl
TIME                            PID  UID  GID SIG     COREFILE EXE
       SIZE PACKAGE

Mon 2021-03-01 14:37:22 CET  855151 1000 1000 SIGSEGV present
/usr/bin/bash 51.7K bash-5.1.0-2.fc34.x86_64
</pre>

== User Experience ==
`coredumpctl` should display information about package versions.

`readelf --notes` or similar tools can be used on `.so` files and
compiled programs
to extract the JSON blurb that describes the originating package.

== Dependencies ==
None.

== Contingency Plan ==

* Contingency mechanism: Remove the new compilation flags. Rebuild any
packages that were build with the new flags.
* Contingency deadline: Beta freeze.
* Blocks release? No.

== Documentation ==
* https://systemd.io/COREDUMP_PACKAGE_METADATA/
* https://github.com/systemd/package-notes

See also [[Changes/DebuginfodByDefault]].

Thanks for revising the change proposal and filling in more details.
After reading through it, I have some questions:

1) The proposal notes that users tend to combine built packages from
different distributions.  Even in the current environment, do we care
about those use cases without also getting a reproducer for Fedora?
For me, I feel that in a situation like that where a user has
submitted a bug report that implies using a binary from some other
distribution will lead me to ask "ok, but does this happen with the
packages provided in Fedora?  If so, how can I reproduce the problem
locally?"  So while these scenarios are described in the proposal, are
they something that Fedora is trying to support?

2) The proposal is built around using the package NVR to indicate
where it came from.  But those names aren't unique.  In some cases
it'll work, but in cases where the noted package cannot be found or
has been reaped or is just otherwise unavailable, you're back to
asking for a reproducer on a Fedora release, right?  Does the NVR data
save much work over having build-ID plus debuginfod?  That's not
rhetorical?  I don't have many bug reports that are not resolvable by
just talking through a reproducer and seeing it happen locally, but I
know I'm not a control case.

3) The proposal notes making crash reporting more user-readable.  NVRs
instead of Build-IDs, for instance.  Why can't systemd ask debuginfod
for that information for reporting?  Why does this need to be embedded
in the ELF objects?  If it's a value-add, then it could happen if
debuginfod is set up and just have it fall back on the current
reporting mechanism.

Thanks,

--
David Cantrell <dcantrell@xxxxxxxxxx>
Red Hat, Inc. | Boston, MA | EST5EDT
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux