F38 proposal: Reproducible builds: Clamp build mtimes to $SOURCE_DATE_EPOCH (System-Wide Change proposal)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes

This document represents a proposed Change. As part of the Changes
process, proposals are publicly announced in order to receive
community feedback. This proposal will only be implemented if approved
by the Fedora Engineering Steering Committee.

== Summary ==

The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`.
When an RPM package is built, mtimes of packaged files will be clamped
to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest
`%changelog` entry. As a result, more RPM packages will be
reproducible: The actual modification time of files that are e.g.
modified in the `%prep` section or built in the `%build` section will
not be reflected in the resulting RPM packages. Files in RPM packages
will have mtimes that are independent of the time of the actual build.

== Owner ==
* Name: [[User:Churchyard|Miro Hrončok]], [[User:Zbyszek|Zbigniew
Jędrzejewski-Szmek]]
* Email: mhroncok at redhat.com, zbyszek at in.waw.pl


== Detailed Description ==
This change exists to make RPM package builds more reproducible. A
common problem that prevents [https://reproducible-builds.org/ build
reproducibility] is the mtime (modification times) of the packaged
files.

Suppose we package an RPM package of software called `skynet` in
version `1.0`. Upstream released this version at datetime A. A Fedora
packager creates the RPM package at datetime B. Unfortunately, the
packager needs to patch the sources in the RPM `%prep` section. When
the build runs at datetime C, the modification datetime of the patched
file is set to C. When the build runs again in an otherwise identical
environment at datetime D, the modification datetime of the patched
file is set to D. As a result, the build is not bit-by-bit
reproducible, because the datetime of the build is saved in the
resulting package.
Patching is not necessary to make this happen. When a source file is
compiled into a binary file, the modification datetime is also set to
the datetime of the build. In practice, the modification datetime of
many files packaged in RPM packages is dependent on when the package
was actually built.

To eliminate this problem, we propose to clamp build mtimes to
`$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the
`$SOURCE_DATE_EPOCH` environment variable based on the latest
`%changelog` entry because the `%source_date_epoch_from_changelog`
macro is set to `1`. We will also set the
`%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when
files are packaged to the RPM package, their modification datetimes
will be clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry
datetime). Clamping means that all files which would otherwise have a
modification datetime higher than `$SOURCE_DATE_EPOCH` will have the
modification datetime changed to `$SOURCE_DATE_EPOCH`; files with
mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the
original mtimes.

This functionality is already implemented in RPM. We will enable it by
setting `%clamp_mtime_to_source_date_epoch` to `1`.

=== Non-goal ===

We do not aim to make all Fedora packages reproducible (at least not
as part of this change proposal). We just eliminate one problem that
we consider the biggest blocker for reproducible builds.

=== Python bytecode ===

When Python bytecode cache (a `.pyc` file) is built, the mtime of the
corresponding Python source file (`.py`) is included in it for
invalidation purposes. Since the `.pyc` file is created before RPM
clamps the mtime of the `.py` file, the mtime stored in the `.pyc`
file might be higher than the corresponding mtime of the `.py` file.

With the previous example, if `skynet` is written in Python:
# `skynet.py` is modified in `%prep` and hence has mtime set to the
time of the build
# `skynet.pyc` is generated in `%install` and the mtime of `skynet.py`
is saved in it
# RPM clamps the mtime of `skynet.py`
# `skynet.pyc` is considered invalid by Python on runtime, as the
stored and actual mtime of `skynet.py` don't match

To solve this, we will modify Python to clamp the stored mtime to
`$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream
Python chooses to invalidate bytecode cache based on hashes instead of
mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause
performance issues for big files, so Fedora's Python already deviates
from upstream behavior when building RPM packages. To avoid
accidentally breaking the behavior when
`%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and
buildroot policy scripts for creating the Python bytecode cache will
be modified to unset `$SOURCE_DATE_EPOCH` when
`%clamp_mtime_to_source_date_epoch` is not set to `1`.

This behavior might be proposed upstream if it turns out to be
superior to the current upstream choice, in case we
[https://discuss.python.org/t/14594 won't redesign the bytecode-source
relationship entirely] instead.

=== Opting out ===

Packages broken by this new behavior can unset
`%clamp_mtime_to_source_date_epoch` but packagers are encouraged to
fix the problem instead.

== Feedback ==
Enabling this RPM feature was
[https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/126
proposed as a pull request] to {{package|redhat-rpm-config}} in April
2021. It received good feedback with the exception of the following:

* it was said the change needs to be coordinated with the Python maintainers
* it was said the change should be done via a change process for
better coordination and exposure

We believe that by proposing this via the change process and planning
for the changes needed in Python, both issues are addressed.

== Benefit to Fedora ==
We believe that many RPM packages will become reproducible and others
will be more reproducible than before. The benefits of reproducible
builds are better explained at https://reproducible-builds.org/

== Scope ==
* Proposal owners:
** Propose a PR for {{package|redhat-rpm-config}} (set
`%clamp_mtime_to_source_date_epoch` to `1`, possibly only when
`%source_date_epoch_from_changelog` is set)
** Propose a PR for {{package|python-rpm-macros}} (unset
`$SOURCE_DATE_EPOCH` while creating `.pyc` files iff
`%clamp_mtime_to_source_date_epoch` is not `1`)
** Propose a PR for
[https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-timestamp-invalidation-mode.patch
the Python's bytecode invalidation mode patch] for all Python versions
that have it
** Backport (the new portion of) the patch to older Pythons
({{package|python2.7}}, {{package|python3.6}} and PyPys)
** Test everything together in Copr and deploy it if it works.
** Optional: Run some reproducibility tests before and after this
change and produce some statistics.

* Other developers:
** Test their packages with the new behavior, report problems, and
opt-out if really needed.
* Release engineering: N/A (not needed for this Change)
* Policies and guidelines: N/A (not needed for this Change)
* Trademark approval: N/A (not needed for this Change)
* Alignment with Objectives: N/A (not needed for this Change)


== Upgrade/compatibility impact ==
Nothing anticipated.

== How To Test ==
The change owners plan to perform a mass rebuild in Copr to see if
this breaks anything significantly.
If it actually works as anticipated, they also plan to run some
reproducibility tests and hopefully produce some statistics before and
after this change.

Other packages can test by building their packages and verifying they
still work as expected and no packaged files have higher mtimes than
the last `%changelog` entry.

To verify if this change has landed, run: `rpm --eval
'%clamp_mtime_to_source_date_epoch'` on Fedora 38. The result should
be `1`.

== User Experience ==
Users of Fedora Linux on their machines should not be impacted at all.
Users who build RPM packages atop Fedora will be impacted by this
change the same way Fedora is.

== Dependencies ==

* RPM needs to support this (it already does)
* RPM needs to set `$SOURCE_DATE_EPOCH` (it already does)

== Contingency Plan ==

* Contingency mechanism: The change owners or
{{package|redhat-rpm-config}} maintainers or proven packagers will
revert the change in {{package|redhat-rpm-config}}. That should be
enough to undo anything as the changes in Python should be dependent
on that. If not enough, revert everything.
* Contingency deadline: Ideally, we should do this before the Mass
Rebuild. Technically, we can land it any time before the Beta Freeze,
but it would not change all the packages, which is a bit messy. *
Blocks release? No <

== Documentation ==

This page is the documentation.

== Release Notes ==



-- 
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux