Re: F36 Change: Package information on ELF objects (System-Wide Change proposal)

David Cantrell <dcantrell@xxxxxxxxxx> · Mon, 8 Nov 2021 14:06:17 -0500

On Thu, Oct 28, 2021 at 09:30:18PM +0200, Lennart Poettering wrote:
On Do, 28.10.21 12:10, David Cantrell (dcantrell@xxxxxxxxxx) wrote:

Thanks for revising the change proposal and filling in more details.
After reading through it, I have some questions:

1) The proposal notes that users tend to combine built packages from
different distributions.  Even in the current environment, do we care
about those use cases without also getting a reproducer for Fedora?

I'd see it this way: ultimately, if this gets adopted by multiple
distros this annotation will helps us separating out the reports by
distro, and then ignore everything but fedora. i.e. if someone deploys
a debian or ubuntu container on a fedora host this should be something
we shouldn't be bothered with supporting. But right now a coredump
generated that way won't tell us much about the situation. But once
this spec is adopted this becomes easy: if we get a report we'll
immediately see that the code that aborted was actually from a
different distro, and we can point the reporter to that and tell them
politely to ask the other distro for help, or alternatively invest the
time and reproduce the issue with the binary provided by fedora
instead.

So, having this info around us allows us to be more efficient with
"not caring" for non-fedora issues.

For me, I feel that in a situation like that where a user has
submitted a bug report that implies using a binary from some other
distribution will lead me to ask "ok, but does this happen with the
packages provided in Fedora?  If so, how can I reproduce the problem
locally?"  So while these scenarios are described in the proposal, are
they something that Fedora is trying to support?

Well, I don't think Fedora has to care about crashes in other distro's
binaries. we have more than enough to look after anyway. But I do think
we should make it easier to detect this situation and more easily
provide helpful pointers how to find someone else who might be
interested or what to do to make fedora interested.

3) The proposal notes making crash reporting more user-readable.  NVRs
instead of Build-IDs, for instance.  Why can't systemd ask debuginfod
for that information for reporting?  Why does this need to be embedded
in the ELF objects?  If it's a value-add, then it could happen if
debuginfod is set up and just have it fall back on the current
reporting mechanism.

We want to be able to do things generically in an offline fashion in
systemd-coredump. I.e. we run the coredump analyzer in a tight
sandbox, and we want quick answers without relying on the network.

The goal of systemd-coredump is to make a coredump something that is
primarily a relatively cheapish log event, and were we do analysis as
much as possible locally, automatically, securely, in privacy and
quickly. If we'd always talk to the network we'd have to open our
sandbox quite a bit, we'd be dependent on external services, we'd leak
some data to the outside, we'd be unreliable and slower — and all that
even though we are interested in only a single string of data
ultimately.

(I think debuginfod is excellent, but I think it would probably be a
consumer of this spec, not a replacement. for example, consider that
the spec has a suggested field 'debugInfoUrl' already, which would
inform debugging tools about the debuginfod servers to talk to to
acquire extended debug info data)

I was thinking more about this proposal over the past weekend and
where I keep ending up is that this is really optimizing for a small
use case by touching ELF metadata all over the system.  And that
strikes me as pretty invasive, so is it worth the tradeoffs and risks
and such?

Debugging is a pain, and anything to make that easier is better.  It
has been stated multiple times that the information needs to be in the
ELF header because containers and images may lack an RPM database.
Fair, but what about the users that both want a container and image
without the RPM database and systemd-coredump?  They still have all of
their ELF files with this information that they removed in other ways.
Do we provide those users with a script to strip .gnu.notes from
everything or is that even a use case of concern?

Efforts to get the system very small for container and image use has
been a goal for a while.  And sure we're not talking about a lot of
data, but that's now.  The size of everything only grows, so is that
something to consider with the implementation of this feature?

Another thing I thought about were reproducible builds.  Does this
impact reproducible builds and if so, how do we handle that?

I would feel more comfortable with this proposal if the data for
systemd-coredump was not part of the ELF metadata.  Or if it
absolutely must be part of the ELF metadata, users should know how it
can be removed.  I would also vote for a format other than JSON, but
that's just me.

Thanks,

--
David Cantrell <dcantrell@xxxxxxxxxx> Red Hat, Inc. | Boston, MA |
EST5EDT
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure