Re: SPDX Statistics - R.U.R. edition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Richard,

So I tried to figure out what all this really meant in practice by
transforming some of my Fedora packages. But almost immediate ran into
conflicts with how SPDX defines license identifiers and expressions.

I think the main conflict is that SPDX identifiers and expressions are
meant to apply to individual source files (and not describe the general
intended license of the larger work), where the Fedora spec file
License tags is meant to provide the approximate license of the
(sub)package as a whole (which can consist of multiple larger, possibly
independently licensed, works).

Also I found the tooling around this hard to use/understand. It seems
various tools just haven't caught up or aren't even packaged for Fedora
itself and are only available in some giant container image blob (e.g.
fossology, really a webapp that you then should run "locally", which I
never could get working [or simply didn't understood how to use]).

For the elfutils project Housam Alamour created a new eu-srcfiles
utility for version 0.190 (already packaged for Fedora 37..Rawhide),
which might be helpful for native ELF plus DWARF based packages. At the
moment it does require you have the build requires and debugsources
installed, but a newer version will query debuginfod for that.

With that we might be able to build somewhat simpler command line tools
to help packagers extract all the license snippets found in the source
files included in each binary.

e.g. for the mutt package, you can get a rough estimate of licenses
used in all the binaries using (if that is really what you are
insisting packagers do instead of just using the declared licenses):

$ dnf install mutt
$ dnf builddep mutt
$ dnf debuginfo-install mutt

$ for i in `rpm -ql mutt`; do eu-elfclassify --elf --file $i; \
  if [ $? -eq 0 ]; then eu-srcfiles --exec $i; fi; done | sort -u \
  | xargs licensecheck --shortname-scheme spdx | cut -f2- -d: \
  | sort -u | sed -z -e 's/\n / AND /g'

 GPL-2.0-or-later AND GPL-3.0-or-later AND HPND-sell-variant AND LGPL
AND LGPL-2.1-or-later AND *No copyright* GPL-2.0-or-later AND *No
copyright* public-domain AND *No copyright* UNKNOWN AND UNKNOWN AND
Zlib

Which still requires lots of investigation (there are various
UNKNOWNs), but might be a good starting point. At least for me
something like that would be much more usable than a container image
packaged webapp.

On Mon, 2023-09-18 at 20:47 -0400, Richard Fontana wrote:
> On Sun, Sep 17, 2023 at 11:37 AM Mark Wielaard <mark@xxxxxxxxx> wrote:
> > 
> > To be clear I don't mind using a different set of short-hands in the
> > License tags. Although it feels a little odd to try to create separate
> > identifiers for lax-permissive MIT/BSD like licenses which sometimes
> > just different in one or two words.
> 
> FWIW, usually a difference of one or two words wouldn't be enough to
> result in creation of a distinct SPDX identifier. The standard applied
> by SPDX is, informally, whether the difference is "legally
> substantive" (this has its flaws but seems to work OK in practice).

But then for the Hybrid BSD license that parts of bzip and valgrind
uses it actually has different identifiers depending on the version of
the package (it actually has both bzip2-1.0.5 and bzip2-1.0.6 which are
literally exactly the same except for the version string and the
copyright year).

> I think anyone should be free to propose a new umbrella identifier (in
> SPDX expression format) that would cover multiple licenses, as we've
> done with `LicenseRef-Fedora-Public-Domain` and
> `LicenseRef-Fedora-UltraPermissive`. The important thing is that it be
> well defined in some way.

OK, so how would we do this for this Hybrid BSD license?
And what is "well defined"?

> > > > What is the goal of dropping the effective license and make packagers
> > > > list all the licences of some code snippets originally incorporated
> > > > under lax-permissive licenses? Is that not just make work for the
> > > > packager if upsteam just uses one effective license?
> > > 
> > > One rationale is given in Fedora legal documentation:
> > > "There is no agreed-upon set of criteria or rules under which one can
> > > make conclusions about “effective” licenses or reduce composite
> > > license expressions to something simpler."
> > 
> > Isn't that not just like most other things fedora, we follow
> > upstream. Upstream states the (effective) license and we just adopt
> > that. If we notice that there might be a bug and the effective license
> > isn't exactly as the upstream project states, then we fix that
> > upstream?
> 
> I basically don't recognize "effective license" as a valid concept. I
> see people using it, perhaps increasingly, but I never see any
> definition of what it means.
> It sounds like you are using it to mean "whatever the upstream project
> seems to say the license is, despite possible evidence to the
> contrary".  I'm not sure that's how other people are using "effective
> license".

I would call it the intended license. Normally an (upstream) project
declares their intended license by placing a COPYING or LICENSE
document at the top-level (or different ones in subdirs if different
parts have different intended licenses). That intended license is the
effective license, meaning the license you would have to follow when
redistributing the project. Any other licenses used in the project
would only have requirements that are subsumed by the intended license.

> I think Jilayne would disagree with this, but in practice, I also
> don't see what we could fix upstream, since there is no standard for
> how you communicate or document what the effective license is
> (regardless of what it means). The only related standard I know of for
> documenting licensing of projects is REUSE (https://reuse.software)
> which I think implicitly also rejects the concept of "effective
> licensing".

Both REUSE and SPDX are intended to be used at the individual source
file level. Trying to use them at the binary or package level as Fedora
wants to do seems to bring up these conflicts yes.

That doesn't mean there aren't "standards" for this. Like I said
upstream often has a top-level LICENSE, COPYING or README file
declaring the intended license. There also often is a NOTICES file
listing any legal notices subsumed by the intended/effective license.

> > > Basically, everyone has been making up their own interpretive system
> > > for deciding what an "effective license" is, with no consistencies
> > > across upstream packages and Fedora package maintainers.
> > 
> > Is this really a problem? Could you show an example where an upstream
> > or package maintainer stated in the license tag that the effective
> > license was say "GPLv3+", but it would have been more "correct" to state
> > that it was "GPL-3.0-or-later AND GPL-3.0-or-later WITH
> > Autoconf-exception-generic-3.0 AND GPL-3.0-or-later WITH
> > Bison-exception-2.2 AND GPL-2.0-or-later AND GPL-2.0-or-later WITH
> > Autoconf-exception-generic AND LGPL-2.1-or-later AND LGPL-2.0-or-later
> > AND X11"?
> 
> I will not argue this, but I will make two observations. One is
> something I've said before, which is that people seem to be
> complaining about the current standards for license tags only when
> they are lengthy. I think it would be more consistent to argue that we
> don't need license tags at all. I have no attachment to RPM-style
> license tags, though Red Hat finds them marginally useful for some
> purposes.

What are the purposes for which Red Hat finds the spec license tags
useful?

Just dropping the license tags from the spec file is an interesting
idea. Would we then adopt something like a separate copyright file like
Debian does?

> The other thing is that the discipline that produces license tags at
> this level of detail is what is needed to uncover licensing problems
> in packages, from Fedora's perspective as a distribution that aims to
> be made up of free software. That is, the detailed license tags are a
> side effect of a valuable license review process and I would be
> concerned that falling back on an effective license approach would
> result in the loss of the benefits we get from that process, which
> actually long precede the abandonment of the Callaway system.

I do agree that a license review process is useful. But I don't agree
this process can be condensed into a "tag" (not even with an expression
language).

> You've contributed to glibc, so you probably know that for many years
> (almost 20 years?) glibc gave the impression that its license, its
> effective license if you will, was LGPLv2.1 or later (except for parts
> that are under the GPL) but it included a substantial amount of code
> under a license that, by the mid-2000s, Debian and later Fedora came
> to regard as non-free. I am speaking of the famous Sun RPC license,
> which prohibits distribution in isolation, a common type of
> proprietary license restriction.
> 
> In that scenario, if you had a license tag that just says
> "LGPL-2.1-or-later" you are concealing the fact that there is also
> some code under a license that cannot be assimilated with LGPL (other
> than by adopting a clever post hoc interpretation which cannot
> possibly be what Sun Microsystems had in mind in the 1980s) and that
> is not even free software. It seems to me that at the very least the
> license tag, if you're going to have license tags at all, should say
> "LGPL-2.1-or-later AND LicenseRef-Assorted-Other-Free-Software AND
> LicenseRef-sun-rpc". But if there's a practice of just relying on
> whatever the effective license seems to be, you would be inclined not
> to notice a license like this in the first place. This is why the
> issue was first surfaced by Debian, I think. To your later point about
> Debian copyright files, it is obviously true that you don't need to
> have a license tag system like Fedora has for this to happen.

Right, that is the job of a licence legal review process. In the above
case I would say it was just a bug in the upstream package that a
distro caught and which was fixed upstream. I think it would have been
silly to (retroactively) add some extra "tagging" for a license notice
that was in error instead of just fixing the license so that it was
actually compatible with the LGPL (which is what happened).

> While the Sun RPC problem *may* have been excised from glibc, just
> last year we found another license in glibc (and at least one other
> package), this time an IBM license [1], that we consider non-free by
> present day standards, in that case because it involves a patent
> license grant that discriminates according to specific use cases. I
> think we should aspire to finding, *exposing*, and fixing these kinds
> of problems. Exposing should mean at a minimum that we don't
> perpetuate a community-wide decades-old practice of covering these
> problems up, which seems to be one practical effect of indulging in
> effective licensing. I realize all this doesn't itself justify the
> resulting use of complex composite SPDX expressions.

Right, I assume you are talking about the resolv code which carries a
patent notice from IBM saying they might sue you if you use that code
for anything else than doing DNS resolving over TCP/IP. Which is indeed
a odd notice. Happy you found it and you are making IBM fix it. But
IMHO it is just an unintended, license, bug in the upstream package. It
will be fixed, so no need for some complicated license tag.

> > It seems that the "enumeration" expression is not that easy to create
> > objectively. If only because it is actually hard to know which sources
> > to scan to get the license/permission snippets (just the upstream tar
> > ball, the sources as created by fedpkg prep, those actually included
> > in the binaries which depend on the build environment, etc.)
> 
> Similar points have been raised by others. I think a good solution is
> to reformulate what we mean by "enumeration" so that it is more
> practical for Fedora package maintainers. I don't think it is a good
> solution to just give up and no longer review package source code for
> inclusion of licenses that conflict with Fedora licensing policies.

I hope the new eu-srcfiles tool can help. But I also hope we aren't
really going to try to cram all found legal snippets into matching
license tags in the spec files. If we really want to somehow ship all
these legal notices (and we already do, because we do just distribute
the srpms, but maybe for some reason you would like to include them all
in each binary (sub) package? Then lets adopt something like the Debian
copyright file or NOTICES file and also ask upstream to adopt that.

> > And what is the actual purpose and goal of including them in the spec License
> > tag?
> 
> For me, it's that there's no good argument for throwing away the
> information once you have it. You've reviewed a given package and
> let's say you've identified five applicable licenses (let's assume we
> know what applicable means). How do you then decide what information
> to hide? I think you are saying, "review the package thoroughly, but
> don't report what you find in the license tag, just pick the
> license(s) the upstream project indicates are effective". As suggested
> above, I'm not opposed to something similar to this approach (I think
> Jilayne would disagree though) provided that we always expose licenses
> that are not classified as 'allowed' for Fedora.

I think I agree with you that we shouldn't "throw away" any such legal
notices we found. But I don't agree SPDX license tag identifiers in
spec files are a good way to keep such notices.

> > > Also, I don't think "snippets" are the typical case. Often the non-GPL
> > > license will appear to cover a whole file or perhaps a set of multiple
> > > files. I have found it somewhat common for a Fedora package to include
> > > multiple "merely aggregated" works which may be under the GPL and
> > > other licenses. That's mere aggregation based on the license steward's
> > > traditional guidance on interpretation of the GPL. In those scenarios
> > > attempts to apply an effective license theory that ignores the non-GPL
> > > license seem to embody a misunderstanding of the orthodox
> > > interpretation of the GPL.
> > 
> > That is not my experience from working on some larger code bases.  For
> > example when we were integrating GNU Classpath/IcedTea/OpenJDK I went
> > over all the code to make sure we could merge the code bases. The top
> > level LICENSE file explains the (effective) licenses. And every source
> > code file has a header explaining the (effective) license for that
> > file (GPLv2 or GPLv2-plus-classpath-exception). But it also includes
> > lots of notices like:
> > 
> > /*
> >  * This file is available under and governed by the GNU General Public
> >  * License version 2 only, as published by the Free Software Foundation.
> >  * However, the following notice accompanied the original version of this
> >  * file and, per its terms, should not be removed:
> >  *
> >  * Copyright (c) 2004 World Wide Web Consortium,
> >  *
> >  * (Massachusetts Institute of Technology, European Research Consortium for
> >  * Informatics and Mathematics, Keio University). All Rights Reserved. This
> >  * work is distributed under the W3C(r) Software License [1] in the hope that
> >  * it will be useful, but WITHOUT ANY WARRANTY; without even the implied
> >  * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >  *
> >  * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
> >  */
> 
> So in that example, I understand you believe that there is an
> effective license, but it can't be said that the W3C license is not
> also an applicable license, or, if it is, then why should it not be
> removed?

It is applicable, but subsumed by the GPLv2. It should never be removed
since that is just rude (and also the notice and the effective license
say such notices may not be removed).

> You can argue that the W3C license isn't worth including in the
> license tag, but that requires some formulation of a policy for what
> kinds of licenses can and can't be excluded. ,

If the source, like in this case, explicitly says it is not the
effective license of the file then it doesn't need to be mentioned.
Likewise if there is an explicit intended license for the larger work
that subsumes all such legal notices they don't need to be mentioned,
as long as the (source) package does contain all such notices.

> > Something similar is done in glibc. For example several files I
> > contributed to were adapted from some BSD release and have a file
> > header saying the file is copyright the Free Software Foundation,
> > Inc. This file is part of the GNU C Library. And the state they are
> > distributed under the GNU Lesser General Public License 2.1 or
> > later. But also have the original BSD notice in the file:
> > 
> > /*-
> >  * Copyright (c) 1990, 1993, 1994
> >  *      The Regents of the University of California.  All rights reserved.
> >  *
> >  * Redistribution and use in source and binary forms, with or without
> >  * modification, are permitted provided that the following conditions
> >  * are met:
> >  * 1. Redistributions of source code must retain the above copyright
> >  *    notice, this list of conditions and the following disclaimer.
> >  * 2. Redistributions in binary form must reproduce the above copyright
> >  *    notice, this list of conditions and the following disclaimer in the
> >  *    documentation and/or other materials provided with the distribution.
> >  * 4. Neither the name of the University nor the names of its contributors
> >  *    may be used to endorse or promote products derived from this software
> >  *    without specific prior written permission.
> >  *
> >  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> >  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> >  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> >  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> >  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> >  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> >  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> >  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> >  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> >  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> >  * SUCH DAMAGE.
> >  */
> > 
> > But this is not the (effective) licenses, and there is no way to use
> > the code under that license, since all contributions since 1994 have
> > been done under the LGPL.
> 
> Again, someone is making an assumption that something is there that is
> still subject to that license, because otherwise it could be removed.

No it cannot be removed. And no it doesn't mean it is still subject to
that license. There is a Ship of Theseus argument to be made to just
remove to no longer applicable notice. But a) that would just be rude.
And b) the notice itself and the effective license both explicitly say
you must retain the notice. 

> In review of Fedora packages over the past year, we have found a
> number of cases where it seems clear a license notice no longer
> applies to anything in the package, or never applied in the first
> place. In at least one of those cases we recommended to the upstream
> project that it remove the "phantom" license notice.

That sounds like bad advise IMHO. It also destroys historical
information.

> > Likewise for valgrind we have examples of the above. For example the
> > dhat tool which have a GPLv2+ copyright and license header, but also
> > say:
> > 
> > /*
> >    Parts of this file are derived from Firefox, copyright Mozilla Foundation,
> >    and may be redistributed under the terms of the Mozilla Public License
> >    Version 2.0, as well as under the license of this project.  A copy of the
> >    Mozilla Public License Version 2.0 is available at at
> >    https://www.mozilla.org/en-US/MPL/2.0/.
> > */
> > 
> > Again, although there is a reference to MPLv2 here, the code is only
> > available under GPLv2+.
> 
> But that notice literally says there is code available under MPL 2.0.
> 
> If the notice is incorrect, that is a bug that should be fixed
> upstream. But a mere conflict with a project's conception of what its
> effective license is would not mean that the license notice is
> incorrect.

In the case of relicensing MPLv2 to GPLv2+ you could indeed argue that
no notice at all should remain in the source file to the MPLv2. The
MPLv2 does indeed require you remove all MPL notices when converting a
source file to the GPL. But again I consider it rude to not even
mention the origin of the source code and provide a (historical)
reference.

> > 
Cheers,

Mark
--
_______________________________________________
legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Gnome Users]     [KDE Users]

  Powered by Linux