On Wed, 2024-08-28 at 11:29 -0600, Jerry James wrote: > > Upstream also objected to the license string here: > https://packages.fedoraproject.org/pkgs/pl/pl/. According to the git > log, the License tag was converted to SPDX in December 2022, but that > page still shows the License tag from before that. How are these > pages generated? Is there a button I can push to make that page > update? Hah, I think I see the issue, and it's a fun one. That is https://pagure.io/fedora-packages-static , which is deployed in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-packages-static . Turns out the deployment doesn't matter much in this case (it does in others) - all we really do in infra is deploy a container that's built out of the upstream repo, and let it run. So, let's focus on what's in the container. The Dockerfile says: ENTRYPOINT [ "./container/entrypoint.sh" ] which is https://pagure.io/fedora-packages-static/blob/master/f/container/entrypoint.sh . That runs supervisord , which is from the supervisor package, and is a kinda dispatcher thing. And in the container is a config file for it: https://pagure.io/fedora-packages-static/blob/master/f/container/supervisord.conf . AFAICT this is really just telling it to run three things on container startup and leave them running forever - nginx (the web server), uwsgi (which connects solr to nginx), and..."updatescript". Which runs /usr/local/src/packages/container/update-packages.sh , which is https://pagure.io/fedora-packages-static/blob/master/f/container/update-packages.sh . OK! So this is our thing to update the data, it looks like. It's an eternally-looping shell script which runs `make all` then `make update- solr` one time, then runs `make html-only` then `make update-solr` every hour. So, on we go to https://pagure.io/fedora-packages-static/blob/master/f/Makefile , where we find that `make html-only` does sync-repositories, then fetch-data, then html. `sync-repositories` runs `bin/fetch-repository-dbs.py`, which *also* gets release data from Bodhi (not sure why the duplication?) and considers only releases whose 'state' is 'current'. For each 'current' release it figures out where to find the repository metadata for the latest compose for the given release (...pretty sure we have quite a lot of places duplicating this logic...), downloads it, and dumps it into a SQL database (more or less) - https://pagure.io/fedora-packages-static/blob/master/f/bin/fetch-repository-dbs.py . `fetch-data` runs `bin/get-product-names.py`, which dumps Bodhi data on Fedora releases into a JSON file - https://pagure.io/fedora-packages-static/blob/master/f/bin/get-product-names.py . And finally, `html` runs `bin/generate-html.py`, which produces the actual HTML output you get to see from the data in the database - https://pagure.io/fedora-packages-static/blob/master/f/bin/generate-html.py . It does lots of stuff, but it has this rather interesting block: # Override package metadata with rawhide (= lastest) values. if first_pkg_encounter or release_branch == "fedora-rawhide": pkg.summary = raw["summary"] pkg.description = raw["description"] pkg.upstream = raw["url"] pkg.license = raw["rpm_license"] pkg.maintainers = maintainer_mapping["rpms"].get(srpm_name, []) (at line 219). That is, it only updates all that info - including the license - the *first time* the script encounters the package name in the database, or if it's currently running against Rawhide. Specifically, if its "release_branch" is "fedora-rawhide", which - if you parse it back - means the database file produced by `fetch- repository-dbs.py` is called `fedora-rawhide-(something).sqlite`. Why is this interesting? Three reasons! Let's go back and look at fetch-repository-dbs.py. First, remember the thing about how it only considers releases whose state is 'current'? The state of the release that represents Rawhide is not 'current', it is 'pending'. See https://bodhi.fedoraproject.org/releases/F42 . So that script will never consider the Rawhide release at all. Second, even if it did, Bodhi does not call the release "Rawhide" in any way. It considers it to be "42". Its name is F42, its version is 42, and so on. fetch-repository-dbs.py has no logic for realizing which release in Bodhi represents Rawhide (other things that use Bodhi release data *do* handle this - basically you have to find the highest- numbered 'pending' release, or the release whose 'branch' is 'rawhide'). It just takes the "id_prefix" and the "version" for each release as its identifiers, and names the database files based on those. So if it *did* consider the Rawhide release, it would call it "fedora-42" and name the db files that way. And third, the script looks at each existing database file, and if it didn't find a matching release in the Bodhi data, it *removes it*. What this means is that the HTML output script is meant to always wind up using the data for each package from Rawhide (the latest data), but it currently isn't, because it's not running against Rawhide at all. It will use the data from the first repository it happens to run against that contains the package, and ignore data from all other repositories. At this point I got to wondering how the code got this way - it seems weird that there is special purpose code for Rawhide which we can never reach. And, aha, turns out this is all new code, because it used to use PDC: https://pagure.io/fedora-packages-static/c/9463d3793272c349a1377fd22830b3c0ab8e80e1 PDC was decommissioned a couple of weeks back. I checked the logs for the current fedora-packages-static deployment, and it looks like we're definitely using the Bodhi code now, and it's behaving as I surmised: bin/fetch-repository-dbs.py --target-dir /etc/packages/repositories Fetching active releases from Bodhi... Found: ['epel-10', 'epel-10-testing', 'epel-8', 'epel-8-testing', 'epel-9', 'epel-9-testing', 'fedora-39', 'fedora-39-updates', 'fedora-39-updates-testing', 'fedora-40', 'fedora-40-updates', 'fedora-40-updates-testing'] Note how it did *not* find 'fedora-42', or 'fedora-rawhide'. So here's my theory about what's going on. Up until PDC got decommissioned, this was probably working fine, and I'd guess the data for the pl package was up to date. When PDC got decommissioned, updating would've been broken - `fetch-repository-dbs.py` was written to just exit if it can't reach PDC. So the data would've stayed the same. Then, whenever our fedora-packages-static deployment switched over to the Bodhi code, this happened: 1. fetch-repository-dbs.py ran, did not collect Fedora 41 or 42/Rawhide from Bodhi because they're 'pending' not 'current', and deleted the existing database files for those releases. 2. generate-html.py ran, and because there were no Rawhide database files, regenerated the HTML output using the data from the first repository it encountered that contained the pl package. From the logs, it looks like it reads the files in alphabetical order (or uses the same order fetch-repository-dbs.py encountered them), so it went: > Processing database files for epel-10. > Processing database files for epel-10-testing. > Processing database files for epel-8. > Processing database files for epel-8-testing. > Processing database files for epel-9. > Processing database files for epel-9-testing. > Processing database files for fedora-39. > Processing database files for fedora-39-updates. > Processing database files for fedora-39-updates-testing. > Processing database files for fedora-40. > Processing database files for fedora-40-updates. > Processing database files for fedora-40-updates-testing. > Processing database files for epel-10. > Processing database files for epel-10-testing. > Processing database files for epel-8. > Processing database files for epel-8-testing. > Processing database files for epel-9. > Processing database files for epel-9-testing. > Processing database files for fedora-39. > Processing database files for fedora-39-updates. > Processing database files for fedora-39-updates-testing. > Processing database files for fedora-40. > Processing database files for fedora-40-updates. > Processing database files for fedora-40-updates-testing. The pl package is not in EPEL 10, so it used the data from EPEL 8. And on the EPEL 8 branch, it still has the old-style License: https://src.fedoraproject.org/rpms/pl/blob/epel8/f/pl.spec#_76 Elementary, my dear Jerry. ;) I guess I'll try and patch up the app's usage of Bodhi release data and send a PR, that should sort it out. -- Adam Williamson (he/him/his) Fedora QA Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@xxxxxxxxxxxxx https://www.happyassassin.net -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue