On 11/21/24 17:32, Miro Hrončok wrote: > On 21. 11. 24 23:11, William Cohen wrote: >> Sediment has been designed to work with the RPM build process. >> Currently, one needs to use modified RPM macros. These can be created >> quickly by writing the output of the sediment make_sediment_rpmmacros >> command into ~/.rpmmacros. One will also need to define set the pgo >> macro to 1 for the rpmbuild process. The rpm spec file has minimal >> modifications. It has the callgraph files stored as a source file and >> a defines the global call_graph to the source file that holds the call >> graph. > > Hey Will, > > let's say I wan to try this for Python. Where do I start? The README on https://github.com/wcohen/sediment is not very helpful. > > This is what I did based on your email: > > $ sudo dnf --enable-repo=updates-testing install sediment > ... > Installing sediment-0:0.9.3-1.fc41.noarch > > I run make_sediment_rpmmacros, it gives me some macros. Now I am supposed to put those to ~/.rpmmacros. Exccept I never build Python loclly, I use Koji or mock. I can probably amend this to use %global and insert it to python3.14.spec. But what else I need to do? Do you have a step by step kind of document I can follow? > Hi Miro, The tooling doesn't yet fit your work flow of building packages in koji and mock. I am looking into ways of addressing that issue. I an earlier email I mentioned the important thing was have good profiling data. Do you have suggestions on some benchmarks that would properly exercise the python interpreter? I have used pyperformance (https://github.com/python/pyperformance) to get some call graph data for python and added that to a python3.13 srpm available at https://koji.fedoraproject.org/koji/taskinfo?taskID=126526066. Note Koji is NOT building code layout optimization. One would still need to build locally python3.13-3.13.0-1.fc41.src.rpm with sediment-0.9.4 (https://koji.fedoraproject.org/koji/buildinfo?buildID=2596791) installed and ~/.rpmmacros following steps: make_sediment_rpmmacros > ~/.rpmmacros rpm -Uvh python3.13-3.13.0-1.fc41.src.rpm cd ~/rpmbuild/SPECS rpmbuild -ba --define "pgo 1" python3.13.spec The notable difference in the python3.13.spec file is the addition of: # Call graph information SOURCE12: perf_pybenchmark.gv %global call_graph %{SOURCE12} The perf_pybenchmark.gv was generated with steps: python3 -m pip install pyperformance perf record -e branches:u -j any_call -o perf_pybenchmark.data pyperformance run -f -o fc41_x86_python_baseline.json perf report -i perf_pybenchmark.data --no-demangle --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv Added the file to the python srpm: cp perf_pybenchmark.gv ~/rpmbuild/SOURCES/. # edit ~/rpmbuild/SPECS/python3.13.spec to add call graph info The improvements were mixed between the code layout optimized python and the baseline version of the pyperformance benchmarks. This can be seen in the attached python_pgo.out generated by: python3 -m pyperf compare_to fc41_x86_python_baseline.json fc41_x86_python_pgo.json --table > python_pgo.out It looks like a number of the benchmarks are microbenchmarks that are unlikely the benefit much for the code layout optimizations. Are there other python performance tests that you would suggest that have have larger footprint and would better gauge the possible performance improvement from the code layout optimization? Are there better python code examples to collect profiling data on? -Will
Benchmarks with tag 'apps': =========================== Benchmark hidden because not significant (5): 2to3, chameleon, docutils, html5lib, tornado_http Benchmarks with tag 'asyncio': ============================== +-------------------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +=========================+==========================+=======================+ | async_tree_cpu_io_mixed | 613 ms | 621 ms: 1.01x slower | +-------------------------+--------------------------+-----------------------+ | async_tree_eager | 120 ms | 123 ms: 1.03x slower | +-------------------------+--------------------------+-----------------------+ | async_tree_eager_tg | 77.1 ms | 78.3 ms: 1.02x slower | +-------------------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.01x slower | +-------------------------+--------------------------+-----------------------+ Benchmark hidden because not significant (13): async_tree_none, async_tree_cpu_io_mixed_tg, async_tree_eager_cpu_io_mixed, async_tree_eager_cpu_io_mixed_tg, async_tree_eager_io, async_tree_eager_io_tg, async_tree_eager_memoization, async_tree_eager_memoization_tg, async_tree_io, async_tree_io_tg, async_tree_memoization, async_tree_memoization_tg, async_tree_none_tg Benchmarks with tag 'math': =========================== +----------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +================+==========================+=======================+ | float | 92.5 ms | 91.3 ms: 1.01x faster | +----------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.00x faster | +----------------+--------------------------+-----------------------+ Benchmark hidden because not significant (2): nbody, pidigits Benchmarks with tag 'regex': ============================ +----------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +================+==========================+=======================+ | regex_compile | 151 ms | 148 ms: 1.01x faster | +----------------+--------------------------+-----------------------+ | regex_dna | 194 ms | 188 ms: 1.03x faster | +----------------+--------------------------+-----------------------+ | regex_effbot | 3.55 ms | 3.44 ms: 1.03x faster | +----------------+--------------------------+-----------------------+ | regex_v8 | 25.7 ms | 24.3 ms: 1.06x faster | +----------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.03x faster | +----------------+--------------------------+-----------------------+ Benchmarks with tag 'serialize': ================================ +----------------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +======================+==========================+=======================+ | json_dumps | 11.8 ms | 12.1 ms: 1.03x slower | +----------------------+--------------------------+-----------------------+ | json_loads | 28.9 us | 28.7 us: 1.01x faster | +----------------------+--------------------------+-----------------------+ | pickle | 11.9 us | 11.5 us: 1.03x faster | +----------------------+--------------------------+-----------------------+ | pickle_dict | 34.1 us | 31.5 us: 1.08x faster | +----------------------+--------------------------+-----------------------+ | pickle_list | 5.05 us | 4.82 us: 1.05x faster | +----------------------+--------------------------+-----------------------+ | unpickle | 16.2 us | 16.4 us: 1.01x slower | +----------------------+--------------------------+-----------------------+ | unpickle_pure_python | 236 us | 241 us: 1.02x slower | +----------------------+--------------------------+-----------------------+ | xml_etree_iterparse | 108 ms | 107 ms: 1.01x faster | +----------------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.01x faster | +----------------------+--------------------------+-----------------------+ Benchmark hidden because not significant (6): pickle_pure_python, tomli_loads, unpickle_list, xml_etree_parse, xml_etree_generate, xml_etree_process Benchmarks with tag 'startup': ============================== +------------------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +========================+==========================+=======================+ | python_startup | 13.5 ms | 12.9 ms: 1.05x faster | +------------------------+--------------------------+-----------------------+ | python_startup_no_site | 9.18 ms | 8.48 ms: 1.08x faster | +------------------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.07x faster | +------------------------+--------------------------+-----------------------+ Benchmarks with tag 'template': =============================== +----------------+--------------------------+-----------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +================+==========================+=======================+ | mako | 13.1 ms | 13.3 ms: 1.02x slower | +----------------+--------------------------+-----------------------+ | Geometric mean | (ref) | 1.01x slower | +----------------+--------------------------+-----------------------+ Benchmark hidden because not significant (3): django_template, genshi_text, genshi_xml All benchmarks: =============== +-------------------------+--------------------------+------------------------+ | Benchmark | fc41_x86_python_baseline | fc41_x86_python_pgo | +=========================+==========================+========================+ | async_tree_cpu_io_mixed | 613 ms | 621 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | async_tree_eager | 120 ms | 123 ms: 1.03x slower | +-------------------------+--------------------------+------------------------+ | async_tree_eager_tg | 77.1 ms | 78.3 ms: 1.02x slower | +-------------------------+--------------------------+------------------------+ | chaos | 67.0 ms | 65.7 ms: 1.02x faster | +-------------------------+--------------------------+------------------------+ | comprehensions | 19.4 us | 19.7 us: 1.02x slower | +-------------------------+--------------------------+------------------------+ | bench_mp_pool | 9.91 ms | 17.2 ms: 1.74x slower | +-------------------------+--------------------------+------------------------+ | bench_thread_pool | 1.45 ms | 1.52 ms: 1.05x slower | +-------------------------+--------------------------+------------------------+ | coroutines | 24.7 ms | 25.0 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | crypto_pyaes | 81.6 ms | 80.8 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | deepcopy | 412 us | 416 us: 1.01x slower | +-------------------------+--------------------------+------------------------+ | deepcopy_reduce | 3.72 us | 3.61 us: 1.03x faster | +-------------------------+--------------------------+------------------------+ | deepcopy_memo | 46.0 us | 45.5 us: 1.01x faster | +-------------------------+--------------------------+------------------------+ | float | 92.5 ms | 91.3 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | create_gc_cycles | 1.22 ms | 1.20 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | gc_traversal | 3.30 ms | 3.25 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | generators | 31.1 ms | 30.7 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | hexiom | 6.85 ms | 6.79 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | json_dumps | 11.8 ms | 12.1 ms: 1.03x slower | +-------------------------+--------------------------+------------------------+ | json_loads | 28.9 us | 28.7 us: 1.01x faster | +-------------------------+--------------------------+------------------------+ | logging_format | 7.48 us | 7.63 us: 1.02x slower | +-------------------------+--------------------------+------------------------+ | mako | 13.1 ms | 13.3 ms: 1.02x slower | +-------------------------+--------------------------+------------------------+ | mdp | 2.78 sec | 2.63 sec: 1.06x faster | +-------------------------+--------------------------+------------------------+ | nqueens | 93.0 ms | 92.5 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | pickle | 11.9 us | 11.5 us: 1.03x faster | +-------------------------+--------------------------+------------------------+ | pickle_dict | 34.1 us | 31.5 us: 1.08x faster | +-------------------------+--------------------------+------------------------+ | pickle_list | 5.05 us | 4.82 us: 1.05x faster | +-------------------------+--------------------------+------------------------+ | pyflate | 504 ms | 499 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | python_startup | 13.5 ms | 12.9 ms: 1.05x faster | +-------------------------+--------------------------+------------------------+ | python_startup_no_site | 9.18 ms | 8.48 ms: 1.08x faster | +-------------------------+--------------------------+------------------------+ | raytrace | 288 ms | 291 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | regex_compile | 151 ms | 148 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | regex_dna | 194 ms | 188 ms: 1.03x faster | +-------------------------+--------------------------+------------------------+ | regex_effbot | 3.55 ms | 3.44 ms: 1.03x faster | +-------------------------+--------------------------+------------------------+ | regex_v8 | 25.7 ms | 24.3 ms: 1.06x faster | +-------------------------+--------------------------+------------------------+ | richards | 51.3 ms | 52.0 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | scimark_lu | 128 ms | 130 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | scimark_sor | 147 ms | 148 ms: 1.01x slower | +-------------------------+--------------------------+------------------------+ | scimark_sparse_mat_mult | 5.89 ms | 5.81 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | sqlglot_normalize | 120 ms | 123 ms: 1.02x slower | +-------------------------+--------------------------+------------------------+ | sqlite_synth | 2.42 us | 2.50 us: 1.04x slower | +-------------------------+--------------------------+------------------------+ | telco | 9.00 ms | 8.78 ms: 1.03x faster | +-------------------------+--------------------------+------------------------+ | unpickle | 16.2 us | 16.4 us: 1.01x slower | +-------------------------+--------------------------+------------------------+ | unpickle_pure_python | 236 us | 241 us: 1.02x slower | +-------------------------+--------------------------+------------------------+ | xml_etree_iterparse | 108 ms | 107 ms: 1.01x faster | +-------------------------+--------------------------+------------------------+ | Geometric mean | (ref) | 1.00x slower | +-------------------------+--------------------------+------------------------+ Benchmark hidden because not significant (58): 2to3, async_generators, async_tree_none, async_tree_cpu_io_mixed_tg, async_tree_eager_cpu_io_mixed, async_tree_eager_cpu_io_mixed_tg, async_tree_eager_io, async_tree_eager_io_tg, async_tree_eager_memoization, async_tree_eager_memoization_tg, async_tree_io, async_tree_io_tg, async_tree_memoization, async_tree_memoization_tg, async_tree_none_tg, asyncio_tcp, asyncio_tcp_ssl, asyncio_websockets, chameleon, coverage, dask, deltablue, django_template, docutils, dulwich_log, fannkuch, genshi_text, genshi_xml, go, html5lib, logging_silent, logging_simple, meteor_contest, nbody, pathlib, pickle_pure_python, pidigits, pprint_safe_repr, pprint_pformat, richards_super, scimark_fft, scimark_monte_carlo, spectral_norm, sqlglot_optimize, sqlglot_parse, sqlglot_transpile, sympy_expand, sympy_integrate, sympy_sum, sympy_str, tomli_loads, tornado_http, typing_runtime_protocols, unpack_sequence, unpickle_list, xml_etree_parse, xml_etree_generate, xml_etree_process
-- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue