On 15. 01. 20 23:59, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
### File types (and bytecode caches)
The orthogonal dimension is the file type. Python standard library
contains directories with both "extension modules" (written in C
(usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
object file) and "pure Python" modules (written in Python and saved
as `*.py` source file).
Each pure Python module comes in 4 files:
- `module.py` -- the source
- `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode cache
- `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache (level 1)
- `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache (level 2)
I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?
Loading time is theoretically faster for smaller files. Generality, the opt-2
files in the stdlib are a bit smaller, but the opt-1 are not. Technically, I
agree that the loading time difference is negligible.
But no, we didn't do any benchmarking (yet anyway) at the scale of the current
document, that would take a lot of time and energy. The plan is to only do them
for solutions we actually decide to go for (but only if we anticipate a change
-- for example not with the hardlink-based deduplication, but yet with the
zipped stdlib).
### Solution 5: Stop shipping mandatory bytecode cache
This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage
of `python3-libs` (or three different subpackages, that is not
important here). And we only *Recommend* them from `python3-libs` --
by default, the users get them, but for space critical Fedora
flavors (such as container images) the maintainers can opt-out and
so can the powerusers.
This would **save 18.6 MiB / 50%** -- quite a lot.
However, as said earlier, if the bytecode cache files are not there,
Python attempts to create them upon first import. That can result in
several problems, here we will try to propose how to workaround
them.
Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?
The idea was to avoid traversing up, as that can potentially slow down Python
invocation from a deep PATH. But yes, that is possible as well.
For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.
This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.
Correct.
--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx