* Daniel P. Berrangé: > On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote: >> I think Richard said that he would start a thread like this, but it >> hasn't happened, so I feel like should get this off my chest now. >> >> <https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis> >> starts with this: >> >> | No “effective license” analysis >> | >> | The License: field is meant to provide a simple enumeration of the >> | licenses found in the source code that are reflected in the binary >> | package. No further analysis should be done regarding what the >> | "effective" license is, such as analysis based on theories of GPL >> | interpretation or license compatibility or suppositions that >> | “top-level” license files somehow negate different licenses appearing >> | on individual source files. >> >> This is contradictory. I think there are two aspects here: >> >> * Determine possible licenses that end up in the binary package. >> >> * Perform algebraic simplifications on the license list. >> >> Both analyses are forms of effective licensing analysis. Of course, you >> cannot derive an SPDX identifier without doing any analysis. However, I >> strongly believe that the first approach (determining the binary package >> license) is itself a form of effective licensing analysis, and similar >> reasons for package maintainers not doing this applies. The derived >> SPDX identifier will reflect both the package source code and what went >> into the build system. > > It could perhaps be worded better, but I don't see this as contradictory, > it is just a matter of what you consider "effective analysis" to refer to. > The last sentance expands on this to say that 'effective' in this context > is refering to the analysis of license compatibility that Fedora previously > recommended. I think it goes beyond terminology. I think determining the binary RPM licenses has similar complexities than the license algebra. I can't imagine consensus emerging around that. There's just no firm reasoning why we ignore header files and dynamic linking in the License: tag, the glibc startup code, but not static linking in general. I think coming up with a consistent rules is even more complicated than some sort of license algebra, or rules for ignoring certain copyright files. So I think the perceived simplification of the rules fell short, and the present rules are still unworkable. > The analysis maintainers are being asked to do today is not about > interpreting licensing. They "merely" being asked to determine what > source files are containing code that becomes part of the resulting > binary RPM. This is more build system analysis than license analysis, > and distinct from what Fedora would traditionally describe as > "effective license analysis". But that's extremely subjective until we have a consistent set of rules, preferably accompanied by training and tooling for automated license propagation according to the rules we set forth (similar to what we have today for Rust, but for C/C++ and other languages that use a mix of static and dynamic linking). I just don't think we can come up with a consistent set of rules accepted by the wider industry how a build process transforms the source code licenses and the licenses of the build environment into the binary output licenses. Until then, we are basically in the same spot as we were when there was some expectation to perform effective source license analysis. For example, we have fairly strong evidence that the industry as a whole believes that the license of the statically linked glibc startup code can be ignored. Why is that so? >> * Some package maintainers, when translating to SPDX, merely translate >> the existing License: line as best as they can, without looking at the >> actual sources or produced binaries. > > This I think is probably the main flaw in the process we asked our > maintainers to follow. > > At a high level we portrayed the whole exercise as merely a terminology > change, but it was not. > > Given the removal of the effective license analysis requirement, that > was / is an over simplification. Well, I don't agree with this characterization. We are required to do determine binary RPM licenses, which still requires substantial license impact analysis. And we don't have guidelines for that. > Strictly speaking I think the exercise ought to have been portrayed as > more of a license (re-)audit. In the general case maintainers ought to > be redoing the license audit part of the new package review process, > for all existing packages, not blindly converting existing terminology. That's how my management and immediate colleagues have interpreted it, and how I looked at it as well, and it has enormous cost because even for core GNU packages, no one seems to have taken such a close look before. Maybe that's because there is traditionally little overlap between SPDX users and GPL users, but I can't really believe these groups a totally separate. >> In the light of this, I would like to suggest updating the guidelines in >> the following way: >> >> The License: line should be based on the sources only. Using a tool >> such as Fossology to discover relevant licenses and their SPDX tags is >> sufficient. No analysis how licenses from package source code or the >> build environment propagate into binary RPMs should be performed. >> Individual SPDX identifiers that a tool has listed should be separated >> by AND. Package maintainers are encouraged to re-run license analysis >> tooling on the source code as part of major package rebases, and >> update the License: tag accordingly. >> >> To me, that seems to be much more manageable. > > What I'm not a fan on with this approach is that it would cause us > to include licenses that are clearly irrelevant for Fedora binary > packages. If we consider the "license" tag to be something for end > users to look at, I think this will be misleading. We can come up with something that looks at the state of the tree after %prep, or something like that. The problem with dropping stuff arbitrarily is that it makes it again impossible to rely on tooling. > For example in one package I reviewed there is kernel code that is > only built on Solaris which is under the CDDL. Including that in > the Fedora binary RPM license feels totally wrong. I disagree. Upstream may have copied code from the CDDL part of the tree to other parts without updating the license. If we ignore the CDDL license, we say that hasn't happened, and I doubt we are in the position to make such a certification for most packages. Of course someone may have copied code from a Stackoverflow answer (which is generally available under incompatible license terms), and we wouldn't know about that either. But suppressing license information actually present in the source package (although in a supposedly unused location) seems different. > In many packages using autotools there are snippets of m4 code that > are under a variety of licenses, again not affecting the output. > Those would "bloat" the license tag for little obvious gain. We decided we had to include it because the m4 code generates config.h, which is included in the build as if it was a source file. Perhaps we can ignore that because of the general rule that licensing of header files does not matter. But that rule isn't part of the guidelines, even though it is a key part of what makes binary RPM license analysis workable (otherwise you'd end up with a lot of noise from system headers, leading to the problem you noted). > I do agree though that doing *perfect* build system analysis to figure > out what source files become part of the binary RPMs is impractical > for any non-trivial packages. It's not impractical, it's just rather costly (training and tooling). > My approach has been to scan the source for licenses, and then look > at source files with any licenses I was surprised to see. Often it > is possible to exclude these unexpected licenses, because they are > obviously part of the build system, or are obviously for a differnt > OS platform. > > I would describe this as trying to meet the spirit of the having the > RPM license reflect binary content, while acknowledging the reality > that maintainers won't fully analyse the build system as it is too > time consuming & impractical. That's not unreasonable. > I might suggest adding an extra sentance to make it more explicit > that the binary RPM license is not a perfect representation of > the binary content, as may sometimes include extra licenses from > source files that were not relevant. This would reflect the somewhat > pragmatic approach that I think maintainers already take in practice I would welcome that. And update the Rust guidelines accordingly, to clarift that the kind of buildroot-to-binary-RPM propagation that the tooling performs is optional and not required by (the spirit of) the Fedora guidlines. Thanks, Florian _______________________________________________ legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue