Re: F36 Change: Package information on ELF objects (System-Wide Change proposal)

Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx> · Mon, 8 Nov 2021 21:08:04 +0000

On Mon, Nov 08, 2021 at 02:06:17PM -0500, David Cantrell wrote:
> I was thinking more about this proposal over the past weekend and
> where I keep ending up is that this is really optimizing for a small
> use case by touching ELF metadata all over the system. And that
> strikes me as pretty invasive, so is it worth the tradeoffs and risks
> and such?

I think this a point of confusion/disagreement. Why do you say that the
change is "invasive"?

Is it the extra size? It's completely dwarfed by many other routine
changes that we do, from new compiler speed optimizations that
increase size a bit, through new translation strings, through new
translation languages, through any additional functionality in any of
the frequently used libs, to the recent change from XZ to ZSTD. I'm
quite sure that if this level of changed happened implicitly through
e.g. a compiler change, nobody would ever notice.
Is it the extra compilation time? The answer is similar here: the
extra cost of inserting a few hundred bytes is completely dwarfed by
the compilation and linkage times, especially with LTO.
Is it the extra complexity? It's a very very simple generator and one
additional linkage option. This hasn't been widely deployed, so of
course there might be some complications, but there really very little
that can break here, and even if there's some issue, I'm fairly
confident we'll be able to fix it. So far the details of
implementation haven't really been questioned. If it turns out that the
current generator implementation is a problem, I'll be happy to
rewrite it, maybe even in lua, so that it can be done as part of the
spec generation without any external code. (Though I'd very much prefer
to wait with such optimizations until the format and contents have
seen more exposure; right now this would be premature optimization.)
And the note is inserted during package build, and then isn't consumed
by anything, until maybe the program crashes. Parsing the note is
completely trivial in complexity compared to the stack unwinding and
backtrace generation that happens in coredump analyzers. So it really
has no effect during normal runtime, and a very small one in analysis
programs for which the note is intended.

So I really don't understand the question about tradeoffs and risks.

> It has been stated multiple times that the information needs to be
> in the ELF header because containers and images may lack an RPM
> database.  Fair, but what about the users that both want a container
> and image without the RPM database and systemd-coredump?

Please, don't read too much into the part about containers. Personally,
I don't care about containers too much, I care about the other cases,
in particular programs crashing in the initrd.

systemd-coredump is also not particularly important here: it's just one
of consumer of this, even though it'll probably be an important one in
the context of Fedora. We are trying to build a generic standard, and
new uses that we can't even predict now will show up over time. The
format is open-ended, so it seems likely that people will come up
with new stuff to put there in their own builds.

systemd-coredump generally is *not* present in container images.

> They still have all of their ELF files with this information that
> they removed in other ways. Do we provide those users with a script
> to strip .gnu.notes from everything or is that even a use case of
> concern?

FWIW, you can remove the note, e.g. with patchelf or objcopy. 
Maybe I'm misunderstanding, but I don't see why this would be a concern,
and in particular why it would be *our* concern. It'd be like stripping
.note.gnu.build-id: technically possible, but I've never heard that come
up and I don't see why it would.

> Efforts to get the system very small for container and image use has
> been a goal for a while.  And sure we're not talking about a lot of
> data, but that's now.  The size of everything only grows, so is that
> something to consider with the implementation of this feature?

Realistically, the sizes here are too small for this to matter.
For a container image with a few dozen libraries and a few executables,
we are talking about kilobytes of data compared to hundreds of megs
for a container built from distro packages. Anyone who really optimizes
container size to the level where this would matter, is never going to
use our distro binaries but will use custom builds.

> Another thing I thought about were reproducible builds.  Does this
> impact reproducible builds and if so, how do we handle that?

This does not impact reproducible builds at all.
See the answer in
https://fedoraproject.org/wiki/Changes/Package_information_on_ELF_objects#Won.27t_this_affect_the_Reproducible_Builds_effort.3F .

> I would feel more comfortable with this proposal if the data for
> systemd-coredump was not part of the ELF metadata.  Or if it
> absolutely must be part of the ELF metadata, users should know how it
> can be removed.

Yes, it must must be part of the ELF metadata. It's the only place that
makes sense technically. It's the same with .note.gnu.build-id:
a note is used because it is automatically loaded by the linker when
the program is loaded, so that it can be retrieved from the core dump.

The scenario we are talking about here is the following: a user
starts a program, and while the program is running, performs a dnf
upgrade. The program then crashes. At this point, all files *on disk*
have the new contents, 'rpm -qf' will report new package versions,
any extended attributes or other metadata that is set on files also
pertains to the upgraded packages. The only thing that allows us to
recognize the real version of the crashing program is the data in
memory, either .note.gnu.build-info or the proposed .note.package.
(OK, strictly speaking we can try to looking at function addresses
and look through binaries until we find a matching one, but let's say
that this is not the best use of maintainer time.)

A second scenario is that a program is started *during* a dnf upgrade.
(This sounds like a narrow case, but the sad reality is that programs
often crash at such a time, because either libraries are in unexpected
versions, or files on disk are mismatched, or because the environment
in rpm scriptlets is not the same in a normal invocation.) If that program
crashes, we are in the same situation that info on disk is not useful.

So yeah, we put this in an ELF note because this note describes the
program code contained by the ELF file and by attaching it there they
are propagated together and the metainformation is available along
with the code.

> I would also vote for a format other than JSON, but that's just me.

We did investigate other formats first, in particular separate notes
for separate fields. Fields are aligned, and need headers, so the
overall cost was significantly higher. And when we switch to a single
note, we need some mechanism to combine multiple fields. JSON is
not great for some use cases, but for interchange of machine-parseable
data it's pretty good: simple, well-understood, widely-used, even the
lack of comments is more of an advantage than a hindrance for us.

Please let me know if there are further questions.

Zbyszek
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure