Re: SWIPL moves to Fedora (packages.fpo page generation)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2024-08-28 at 11:29 -0600, Jerry James wrote:
> 
> Upstream also objected to the license string here:
> https://packages.fedoraproject.org/pkgs/pl/pl/.  According to the git
> log, the License tag was converted to SPDX in December 2022, but that
> page still shows the License tag from before that.  How are these
> pages generated?  Is there a button I can push to make that page
> update?

Hah, I think I see the issue, and it's a fun one.

That is https://pagure.io/fedora-packages-static , which is deployed in
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-packages-static
. Turns out the deployment doesn't matter much in this case (it does in
others) - all we really do in infra is deploy a container that's built
out of the upstream repo, and let it run.

So, let's focus on what's in the container. The Dockerfile says:

ENTRYPOINT [ "./container/entrypoint.sh" ]

which is
https://pagure.io/fedora-packages-static/blob/master/f/container/entrypoint.sh
. That runs supervisord , which is from the supervisor package, and is
a kinda dispatcher thing. And in the container is a config file for it:
https://pagure.io/fedora-packages-static/blob/master/f/container/supervisord.conf
.

AFAICT this is really just telling it to run three things on container
startup and leave them running forever - nginx (the web server), uwsgi
(which connects solr to nginx), and..."updatescript". Which runs
/usr/local/src/packages/container/update-packages.sh , which is
https://pagure.io/fedora-packages-static/blob/master/f/container/update-packages.sh
.

OK! So this is our thing to update the data, it looks like. It's an
eternally-looping shell script which runs `make all` then `make update-
solr` one time, then runs `make html-only` then `make update-solr`
every hour. So, on we go to
https://pagure.io/fedora-packages-static/blob/master/f/Makefile , where
we find that `make html-only` does sync-repositories, then fetch-data,
then html.

`sync-repositories` runs `bin/fetch-repository-dbs.py`, which *also*
gets release data from Bodhi (not sure why the duplication?) and
considers only releases whose 'state' is 'current'. For each 'current'
release it figures out where to find the repository metadata for the
latest compose for the given release (...pretty sure we have quite a
lot of places duplicating this logic...), downloads it, and dumps it
into a SQL database (more or less) -
https://pagure.io/fedora-packages-static/blob/master/f/bin/fetch-repository-dbs.py
.

`fetch-data` runs `bin/get-product-names.py`, which dumps Bodhi data on
Fedora releases into a JSON file -
https://pagure.io/fedora-packages-static/blob/master/f/bin/get-product-names.py
.

And finally, `html` runs `bin/generate-html.py`, which produces the
actual HTML output you get to see from the data in the database - 
https://pagure.io/fedora-packages-static/blob/master/f/bin/generate-html.py
. It does lots of stuff, but it has this rather interesting block:

            # Override package metadata with rawhide (= lastest) values.
            if first_pkg_encounter or release_branch == "fedora-rawhide":
                pkg.summary = raw["summary"]
                pkg.description = raw["description"]
                pkg.upstream = raw["url"]
                pkg.license = raw["rpm_license"]
                pkg.maintainers = maintainer_mapping["rpms"].get(srpm_name, [])

(at line 219). That is, it only updates all that info - including the
license - the *first time* the script encounters the package name in
the database, or if it's currently running against Rawhide.
Specifically, if its "release_branch" is "fedora-rawhide", which - if
you parse it back - means the database file produced by `fetch-
repository-dbs.py` is called `fedora-rawhide-(something).sqlite`.

Why is this interesting? Three reasons! Let's go back and look at
fetch-repository-dbs.py.

First, remember the thing about how it only considers releases whose
state is 'current'? The state of the release that represents Rawhide is
not 'current', it is 'pending'. See
https://bodhi.fedoraproject.org/releases/F42 . So that script will
never consider the Rawhide release at all.

Second, even if it did, Bodhi does not call the release "Rawhide" in
any way. It considers it to be "42". Its name is F42, its version is
42, and so on. fetch-repository-dbs.py has no logic for realizing which
release in Bodhi represents Rawhide (other things that use Bodhi
release data *do* handle this - basically you have to find the highest-
numbered 'pending' release, or the release whose 'branch' is
'rawhide'). It just takes the "id_prefix" and the "version" for each
release as its identifiers, and names the database files based on
those. So if it *did* consider the Rawhide release, it would call it
"fedora-42" and name the db files that way.

And third, the script looks at each existing database file, and if it
didn't find a matching release in the Bodhi data, it *removes it*.

What this means is that the HTML output script is meant to always wind
up using the data for each package from Rawhide (the latest data), but
it currently isn't, because it's not running against Rawhide at all. It
will use the data from the first repository it happens to run against
that contains the package, and ignore data from all other repositories.

At this point I got to wondering how the code got this way - it seems
weird that there is special purpose code for Rawhide which we can never
reach. And, aha, turns out this is all new code, because it used to use
PDC:

https://pagure.io/fedora-packages-static/c/9463d3793272c349a1377fd22830b3c0ab8e80e1

PDC was decommissioned a couple of weeks back. I checked the logs for
the current fedora-packages-static deployment, and it looks like we're
definitely using the Bodhi code now, and it's behaving as I surmised:

bin/fetch-repository-dbs.py --target-dir /etc/packages/repositories
Fetching active releases from Bodhi...
Found: ['epel-10', 'epel-10-testing', 'epel-8', 'epel-8-testing', 'epel-9', 'epel-9-testing', 'fedora-39', 'fedora-39-updates', 'fedora-39-updates-testing', 'fedora-40', 'fedora-40-updates', 'fedora-40-updates-testing']

Note how it did *not* find 'fedora-42', or 'fedora-rawhide'.

So here's my theory about what's going on. Up until PDC got
decommissioned, this was probably working fine, and I'd guess the data
for the pl package was up to date. When PDC got decommissioned,
updating would've been broken - `fetch-repository-dbs.py` was written
to just exit if it can't reach PDC. So the data would've stayed the
same. Then, whenever our fedora-packages-static deployment switched
over to the Bodhi code, this happened:

1. fetch-repository-dbs.py ran, did not collect Fedora 41 or 42/Rawhide
from Bodhi because they're 'pending' not 'current', and deleted the
existing database files for those releases.
2. generate-html.py ran, and because there were no Rawhide database
files, regenerated the HTML output using the data from the first
repository it encountered that contained the pl package. From the logs,
it looks like it reads the files in alphabetical order (or uses the
same order fetch-repository-dbs.py encountered them), so it went:

> Processing database files for epel-10.
> Processing database files for epel-10-testing.
> Processing database files for epel-8.
> Processing database files for epel-8-testing.
> Processing database files for epel-9.
> Processing database files for epel-9-testing.
> Processing database files for fedora-39.
> Processing database files for fedora-39-updates.
> Processing database files for fedora-39-updates-testing.
> Processing database files for fedora-40.
> Processing database files for fedora-40-updates.
> Processing database files for fedora-40-updates-testing.
> Processing database files for epel-10.
> Processing database files for epel-10-testing.
> Processing database files for epel-8.
> Processing database files for epel-8-testing.
> Processing database files for epel-9.
> Processing database files for epel-9-testing.
> Processing database files for fedora-39.
> Processing database files for fedora-39-updates.
> Processing database files for fedora-39-updates-testing.
> Processing database files for fedora-40.
> Processing database files for fedora-40-updates.
> Processing database files for fedora-40-updates-testing.

The pl package is not in EPEL 10, so it used the data from EPEL 8. And
on the EPEL 8 branch, it still has the old-style License:

https://src.fedoraproject.org/rpms/pl/blob/epel8/f/pl.spec#_76

Elementary, my dear Jerry. ;)

I guess I'll try and patch up the app's usage of Bodhi release data and
send a PR, that should sort it out.
-- 
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@xxxxxxxxxxxxx
https://www.happyassassin.net




-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux