On 8/31/23 2:39 AM, Daniel P. Berrangé wrote:
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the
Callaway system to SPDX seem to be, at root, an aesthetic distaste for
complex license expressions in RPM license metadata. This may explain
why some favor application of "effective license" analysis. I suspect
there is also a sort of psychological desire to hide the underlying
licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49eece2a4839c357a77beb23d8760ff33be48cc
as an example of "complex license expressions" for which
there is likely an aesthetic distaste. Each distinct
SPDX-License-Identifier tag expession, is combined such
that we end up with:
License: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-2-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR CDDL-1.0) AND ((GPL-2.0-only WITH Linux-syscall-note) OR Linux-OpenIB) AND ((GPL-2.0-only WITH Linux-syscall-note) OR MIT) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR MIT) AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND GPL-1.0-or-later AND (GPL-1.0-or-later OR BSD-3-Clause) AND (GPL-1.0-or-later WITH Linux-syscall-note) AND GPL-2.0-only AND (GPL-2.0-only OR Apache-2.0) AND (GPL-2.0-only OR BSD-2-Clause) AND (GPL-2.0-only OR BSD-3-Clause) AND (GPL-2.0-only OR CDDL-1.0) AND (GPL-2.0-only OR Linux-OpenIB) AND (GPL-2.0-only OR MIT) AND (GPL-2.0-only OR X11) AND (GPL-2.0-only WITH Linux-syscall-note) AND GPL-2.0-or-later AND (GPL-2.0-or-later OR BSD-2-Clause) AND (GPL-2.0-or-later OR BSD-3-Clause) AND (GPL-2.0-or-later OR MIT) AND (GPL-2.0-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH Linux-syscall-note) AND ISC AND LGPL-2.0-or-later AND (LGPL-2.0-or-later OR BSD-2-Clause) AND (LGPL-2.0-or-later WITH Linux-syscall-note) AND LGPL-2.1-only AND (LGPL-2.1-only OR BSD-2-Clause) AND (LGPL-2.1-only WITH Linux-syscall-note) AND LGPL-2.1-or-later AND (LGPL-2.1-or-later WITH Linux-syscall-note) AND (Linux-OpenIB OR GPL-2.0-only) AND (Linux-OpenIB OR GPL-2.0-only OR BSD-2-Clause) AND MIT AND (MIT OR Apache-2.0) AND (MIT OR GPL-2.0-only) AND (MIT OR GPL-2.0-or-later) AND (MIT OR LGPL-2.1-only) AND (MPL-1.1 OR GPL-2.0-only) AND (X11 OR GPL-2.0-only) AND (X11 OR GPL-2.0-or-later) AND Zlib AND (copyleft-next-0.3.1 OR GPL-2.0-or-later) AND (Redistributable, no modification permitted)
Given that the kernel is a very large package with many files and it has
adopted SPDX ids at the file level (which means the licensing info is
far more complete and easier to parse :) - there is nothing surprising
to me about the length of this string. It is what it is!
While the majority of files in the kernel are "GPL-2.0-only",
a number of files are offered under a choice of licenses (OR).
Even if 99% of files were simply GPL-2.0-only, it only takes
a handful of files being offered under a choice, to result in
an enourmous SPDX expression like the one above. In the above
example, at a bare minimum it would only take 30 files, out
of the kernel's 80,000 to have distinct licence choices to
cause the existance the above expression.
That's an interesting point, but I'm not sure how we could justify some
kind of an exception in such a case
While this is an accurate reflection of the range of distinct
file license choices, I'm not convinced that this approach is
especially beneficial to Fedora users.
well, it's not really just about Fedora users - besides the benefit
downstream, I think there is some benefit to what Fedora is doing in a
broader, example-setting, ecosystem sense. I guess part of this feeling
comes from my thinking that any desire or attempt to obscure the license
complexity is not a good thing and potentially creates more work or
issues - reflecting the reality, to me, sets a good precedent
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only"
and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files
carry this choice and we're not telling the user which 1% of
files it applies to ?
they can run a license scanner and create an SPDX document that shows
the file level license info to determine this. And that report will be
far more complex and lengthy than what you came up with above ;)
In that way, what you have above is a useful "summary" and accurate
reflection of the big picture
The previous effective license analysis addressed this problem,
such that everything reduced down to "GPLv2 and Redistributable"
I don't want to suggest going back to effective analysis as I
think that was overly simplified, but perhaps we can finese
what we're doing today.
ie tather than trying to maintain the full list of choices, can
we eliminate all the OR clauses, such that we present just a
flat list of each distinct SPDX license name that is found.
IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how
would we articulate that? I'm not sure why this is really any "better"
than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification
permitted" ones... that is likely the firmware licenses that were never
captured
I do think that the current approach can be criticized as being overly
pedantic, and perhaps also internally contradictory (some of Florian's
recent comments get at the various ways in which we are being
contradictory). We have a still-undocumented rule that what I call
"true public domain" should not be reflected in the License: field
(unless it would otherwise be empty), yet we have carefully attempted
to collect nonstandard public domain dedication statements and cover
those by `LicenseRef-Fedora-Public-Domain`. We have been using a
similar approach with `LicenseRef-Fedora-UltraPermissive`. These
basically replace Callaway system names "Public domain" (though this
was sometimes used for "true public domain") and "Freely
redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in
including `LicenseRef-Fedora-Public-Domain` and
`LicenseRef-Fedora-UltraPermissive` in the License: field since they
are associated with no conditions or obligations. In those special
cases where the License: field would otherwise be empty, we can ask
SPDX to create unique identifiers for the license text in question.
I think there is value in LicenseRef-Fedora-Public-Domain, etc
because it expresses the fact that license analysis has actually
been performed and these public domain choices have been correctly
identified. I don't like the need to special case the omission
to avoid an entirely empty License: field. If we have a need to
record LicenseRef-Fedora-Public-Domain in any scenario, we should
be consistent.
eg consider a package is 100% public domain initially so we
have to record that to avoid empty field:
License: LicenseRef-Fedora-Public-Domain
then one day a file is added which is MIT. I would find it
pretty strange for the rule to say we can now drop the
LicenseRef-Fedora-Public-Domain to go to just record:
License: MIT
when 99% of the files are still LicenseRef-Fedora-Public-Domain
and only 1 single file were MIT.
IMHO the package should be changed to say
License: LicenseRef-Fedora-Public-Domain and MIT
IOW, I think we should always be recording the license, even if
it is a public domain LicenseRef term.
100% agree
We might want to extend this principle to other things, such as GPL
exceptions that entail no conditions in the use case encountered in
particular packages. (There is already an old issue about this, I
think concerning the Bison exception.)
Personally I like the way we're not recording the existance of each
license and exception, just not the creation of the combinatorial
expansion of each license choice.
This wouldn't do *that* much to make License: fields simpler, so maybe
it's not particularly worthwhile. There is also the problem that if we
make it optional, package maintainers may be less likely to scrutinize
things that are assumed to fall into these kinds of categories, when
in some cases they actually wouldn't, although I think it's now clear
that those situations are uncommon. In theory we'd still expect
package maintainers to submit issues to have things that seem to
qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be
challenging to enforce that expectation and the Fedora Legal team
would have to end up doing all that work themselves, which might be a
justifiable result.
As with abandoning the "license of the binary" rule, this would
seemingly be a major departure from the principles established under
the Callaway system.
Any thoughts on this?
With regards,
Daniel
_______________________________________________
legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue