On Tue, May 17, 2022 at 05:46:25PM +0000, Gary Buhrmaster wrote: > On Tue, May 17, 2022 at 2:41 PM Vitaly Zaitsev via devel > <devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > But I think this change also requires automatic conversion of all > > available SPECs, because manual conversion will take years. > > Automating where possible (the existing license has a > one-to-one mapping) is desirable, but realistically > there are just too many packages that currently have > a license such as the poster child "BSD" that are > going to require someone(*) to actually look at the > upstream license files to decide which SPDX id > is the right one (and not all upstreams even name > their license files consistently or the contents of > those license files have minor syntactic variations). Automating the change of identifiers is only meaningful if the values we currently have in the License field are correct. Given that the only time someone other than the package maintainer validates the License field against what is actually in the software is during initial package review, it is possible that some packages have added additional licenses or changed and the spec files are not in sync. We know this happens when package maintainers make announcements about upcoming license changes in a package. Many packagers are good about this, but it is easy to miss a change sometimes when you are doing updates. > (*) I suppose it is conceivable someone could > create a sufficiently accurate AI/ML model > to scan the spec file, all the sources, and choose > correctly. If this was an ongoing activity that > might even make sense. But for a one time > activity I suspect packagers are going to have > to do it manually unless you are volunteering to > build and test that automation. I think a better thing to do would be to use a scanner like scancode[1] to check the source tree in question and then construct a License expression for the spec file from its results. In many cases it will be the same as what we have in the spec file, just with different identifiers. But we would be using the opportunity to both move to new license identifiers and audit the information at the same time. Note that scancode isn't perfect, but it would be used as a workflow tool here as the contributor audits the licensing information in a package. I realize this is a lot of work. It would be best done in hackfest type sessions with work divided up in the subsets of packages. It would be a good opportunity for new contributors to learn how things are structured and send PRs to existing packages. [1] https://github.com/nexB/scancode-licensedb Thanks, -- David Cantrell <dcantrell@xxxxxxxxxx> Red Hat, Inc. | Boston, MA | EST5EDT _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure