On Mon, Aug 21, 2023 at 04:25:22PM +0200, Florian Weimer wrote: > * Daniel P. Berrangé: > > > On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote: > >> I think Richard said that he would start a thread like this, but it > >> hasn't happened, so I feel like should get this off my chest now. > >> > >> <https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis> > >> starts with this: > >> > >> | No “effective license” analysis > >> | > >> | The License: field is meant to provide a simple enumeration of the > >> | licenses found in the source code that are reflected in the binary > >> | package. No further analysis should be done regarding what the > >> | "effective" license is, such as analysis based on theories of GPL > >> | interpretation or license compatibility or suppositions that > >> | “top-level” license files somehow negate different licenses appearing > >> | on individual source files. > >> > >> This is contradictory. I think there are two aspects here: > >> > >> * Determine possible licenses that end up in the binary package. > >> > >> * Perform algebraic simplifications on the license list. > >> > >> Both analyses are forms of effective licensing analysis. Of course, you > >> cannot derive an SPDX identifier without doing any analysis. However, I > >> strongly believe that the first approach (determining the binary package > >> license) is itself a form of effective licensing analysis, and similar > >> reasons for package maintainers not doing this applies. The derived > >> SPDX identifier will reflect both the package source code and what went > >> into the build system. > > > > It could perhaps be worded better, but I don't see this as contradictory, > > it is just a matter of what you consider "effective analysis" to refer to. > > The last sentance expands on this to say that 'effective' in this context > > is refering to the analysis of license compatibility that Fedora previously > > recommended. > > I think it goes beyond terminology. I think determining the binary RPM > licenses has similar complexities than the license algebra. I can't > imagine consensus emerging around that. There's just no firm reasoning > why we ignore header files and dynamic linking in the License: tag, the > glibc startup code, but not static linking in general. I think coming > up with a consistent rules is even more complicated than some sort of > license algebra, or rules for ignoring certain copyright files. So I > think the perceived simplification of the rules fell short, and the > present rules are still unworkable. WRT header file / glibc startup / static linking licenses being ignored, the rationale I would express is that those pieces must (by implication) all already be license compatible (in some way) with the package consuming. This is admittedly though another case of "effective license" doctrine, albeit an implicit one, rather than explicit by the maintainer / package reviewer. > > What I'm not a fan on with this approach is that it would cause us > > to include licenses that are clearly irrelevant for Fedora binary > > packages. If we consider the "license" tag to be something for end > > users to look at, I think this will be misleading. > > We can come up with something that looks at the state of the tree after > %prep, or something like that. > > The problem with dropping stuff arbitrarily is that it makes it again > impossible to rely on tooling. IMHO no matter what we do, the value of the License field is rather limited for semantic interpretation by automated tooling, because it is reducing a very complexity situation down to a very crude expression. It is notable that both Debian "copyright" file format and the REUSE format both provide a massively more granular expression of package licensing, targetted at machine processing. Although our new SPDX expressions are better for machine readability than in the past, we should be explicit about the limitations of our data and problems with attempting todo any semantic analysis based off it. > > For example in one package I reviewed there is kernel code that is > > only built on Solaris which is under the CDDL. Including that in > > the Fedora binary RPM license feels totally wrong. > > I disagree. Upstream may have copied code from the CDDL part of the > tree to other parts without updating the license. If we ignore the CDDL > license, we say that hasn't happened, and I doubt we are in the position > to make such a certification for most packages. > Of course someone may have copied code from a Stackoverflow answer > (which is generally available under incompatible license terms), and we > wouldn't know about that either. But suppressing license information > actually present in the source package (although in a supposedly unused > location) seems different. > I don't think it is different. Both are a case of garbage-in == garbage-out. If upstream copied CDDL code into a file and didn't record this in the file's stated license, then that's a problem whether the original CDDL code is part of the same project or from stack overflow. In both cases upstream made a mistake and failed to record accurate license info in the source file. We're not making any judgement or statement about the accuracy of upstream's licensing record. We're summarizing what upstream has presented in its source files and taking that on faith (unless someone happens to notice some blatent inaccuracy). This feels like a case where we should better document what our input assumptions are with License tag data. Debian copyright files and REUSE data will suffer the same limitation as they're both promoting a view that license information is trackable and analysable per file, so if upstream fails to record a license the copyright/REUSE files will similarly be inaccurate. > > I might suggest adding an extra sentance to make it more explicit > > that the binary RPM license is not a perfect representation of > > the binary content, as may sometimes include extra licenses from > > source files that were not relevant. This would reflect the somewhat > > pragmatic approach that I think maintainers already take in practice > > I would welcome that. And update the Rust guidelines accordingly, to > clarift that the kind of buildroot-to-binary-RPM propagation that the > tooling performs is optional and not required by (the spirit of) the > Fedora guidlines. I agree with your general point that we've not adequately documented many of the assumptions / simplications that maintainers will / should take when analysing license data in source files. Probably the various scenarios you've illustrated should be answered in some way in the licensing pages. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue