Re: SPDX Statistics - R.U.R. edition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Miroslav,

On Sat, Nov 25, 2023 at 05:22:02PM +0100, Miroslav Suchý wrote:
> Dne 24. 11. 23 v 20:07 Mark Wielaard napsal(a):
> >I think the main conflict is that SPDX identifiers and expressions are
> >meant to apply to individual source files (and not describe the general
> >intended license of the larger work),
>
> SPDX license ids are used in SPDX SBOMs that are intended to
> describe large work

SBOMs only decribe the software bill of materials, not the binary
packages created from them. And they don't just use a license tag, but
use multiple different ways of describing the licenses that might be
related, like the Concluded license, the InfoFromFiles licenses, the
Declared license, with a human readable license Comment explantion
describing the difference between those.

> >where the Fedora spec file
> >License tags is meant to provide the approximate license of the
> >(sub)package as a whole (which can consist of multiple larger, possibly
> >independently licensed, works).
>
> Fedora license tag should NOT provide approximate license. It should
> provide exact license.

There is no such thing as an "exact license" of a binary package. Free
Software licenses are social constructs that are interpreted in
different contexts. We can try to describe the license of the binary
(sub) package, but it will always be some "estimate" (especially if we
are only using a fixed set of identifiers and simple expression
language).

> >For the elfutils project Housam Alamour created a new eu-srcfiles
> >utility for version 0.190 (already packaged for Fedora 37..Rawhide),
> >which might be helpful for native ELF plus DWARF based packages. At the
> >moment it does require you have the build requires and debugsources
> >installed, but a newer version will query debuginfod for that.
> >
> >With that we might be able to build somewhat simpler command line tools
> >to help packagers extract all the license snippets found in the source
> >files included in each binary.
> 
> Great. I did not know this.
> 
> >e.g. for the mutt package, you can get a rough estimate of licenses
> >used in all the binaries using (if that is really what you are
> >insisting packagers do instead of just using the declared licenses):
> >
> >$ dnf install mutt
> >$ dnf builddep mutt
> >$ dnf debuginfo-install mutt
> >
> >$ for i in `rpm -ql mutt`; do eu-elfclassify --elf --file $i; \
> >   if [ $? -eq 0 ]; then eu-srcfiles --exec $i; fi; done | sort -u \
> >   | xargs licensecheck --shortname-scheme spdx | cut -f2- -d: \
> >   | sort -u | sed -z -e 's/\n / AND /g'
> >
> >  GPL-2.0-or-later AND GPL-3.0-or-later AND HPND-sell-variant AND LGPL
> >AND LGPL-2.1-or-later AND *No copyright* GPL-2.0-or-later AND *No
> >copyright* public-domain AND *No copyright* UNKNOWN AND UNKNOWN AND
> >Zlib
> >
> >Which still requires lots of investigation (there are various
> >UNKNOWNs), but might be a good starting point. At least for me
> >something like that would be much more usable than a container image
> >packaged webapp.
> 
> There is many ways to do it. We are all exploring various ways. If
> this is better for you then use it. I am afraid this will not work
> for noarch packages.

It wouldn't. And it misses any code that isn't actually referenced in
the native binaries. On the other hand it does find licenses in code
that is included from the build requires/root. Which seems missed by
most other (source code only) scanners.

So if you want something "complete" you'll need to combine multiple
such scanners. What concerns me though is that the output is so big
and only patially scriptable. It is a huge amount of work which isn't
easy to replicate. So it cannot be automated.

> >On Mon, 2023-09-18 at 20:47 -0400, Richard Fontana wrote:
> >>On Sun, Sep 17, 2023 at 11:37 AM Mark Wielaard<mark@xxxxxxxxx>  wrote:
> >>>To be clear I don't mind using a different set of short-hands in the
> >>>License tags. Although it feels a little odd to try to create separate
> >>>identifiers for lax-permissive MIT/BSD like licenses which sometimes
> >>>just different in one or two words.
> >>FWIW, usually a difference of one or two words wouldn't be enough to
> >>result in creation of a distinct SPDX identifier. The standard applied
> >>by SPDX is, informally, whether the difference is "legally
> >>substantive" (this has its flaws but seems to work OK in practice).
> >But then for the Hybrid BSD license that parts of bzip and valgrind
> >uses it actually has different identifiers depending on the version of
> >the package (it actually has both bzip2-1.0.5 and bzip2-1.0.6 which are
> >literally exactly the same except for the version string and the
> >copyright year).
> 
> The SPDX uses markup which allows variation in license. E.g. when you look at
> 
> https://spdx.org/licenses/BSD-3-Clause.html the red parts allow variations. What and how is better visible in source:
> 
> https://github.com/spdx/license-list-XML/blob/main/src/BSD-3-Clause.xml
> 
> Tags <copyrightText> allows any variations. <alt> allows only regexp variations.

OK. Hopefully that will then be done for this Hybrid-BSD license, so
it isn't stuck on some old obsolute bzip2-version.

> >OK, so how would we do this for this Hybrid BSD license?
> >And what is "well defined"?
> 
> Open issue with your proposal at https://gitlab.com/fedora/legal/fedora-license-data

I don't have any specific proposal. Lets just hope SPDX will just
create a new generic Hybrid-BSD variant. I do find it somewhat
disturbing Fedora contributors are asked to file issues in these
external third-pary proprietary trackers.

> >>I basically don't recognize "effective license" as a valid concept. I
> >>see people using it, perhaps increasingly, but I never see any
> >>definition of what it means.
> >>It sounds like you are using it to mean "whatever the upstream project
> >>seems to say the license is, despite possible evidence to the
> >>contrary".  I'm not sure that's how other people are using "effective
> >>license".
> >I would call it the intended license. Normally an (upstream) project
> >declares their intended license by placing a COPYING or LICENSE
> >document at the top-level (or different ones in subdirs if different
> >parts have different intended licenses). That intended license is the
> >effective license, meaning the license you would have to follow when
> >redistributing the project. Any other licenses used in the project
> >would only have requirements that are subsumed by the intended license.
> 
> No.
> 
> This was maybe enough in past when industry used OSS projects
> rarely. Now it is used masively. And we want that industry complies
> with our OSS licenses. If we want this then we should provide good
> overview what licenses are used.
> 
> E.g., your mutt package. The upstream claims it is GPL-2.0-or-later https://gitlab.com/muttmua/mutt/-/blob/master/COPYRIGHT?ref_type=heads
> but there is
> 
> https://gitlab.com/muttmua/mutt/-/blob/master/wcwidth.c?ref_type=heads#L8
> which is |HPND-Markus-Kuhn. And now imagine that there is
> user/company for which ||HPND-Markus-Kuhn license is problematic and
> they cannot use it. If you use only|
> 
> | License: GPL-2.0-or-later
> |
> 
> |they will never know. But when you use|
> 
> |  License: GPL-2.0-or-later AND ||||HPND-Markus-Kuhn|
> 
> |then it is pretty easy for them to do the audit and avoid this package and use alternative.|

Why would they?
The full text of the HPND-Markus-Kuhn is:

"Permission to use, copy, modify, and distribute this software for any
purpose and without fee is hereby granted. The author disclaims all
warranties with regard to this software."

Which is completely subsumed by the GPL. There is literally no extra
useful information provided by adding AND HPND-Markus-Kuhn.

> |This example may look artificial, but I know a lot of companies
> that want to avoid GPL-3.0-or-later.

And how does that help Fedora?

> And Fedora itself avoids many
> licenses that other find ok. E.g. JSON or BSD-3-Clause-Clear|
> 
> |https://docs.fedoraproject.org/en-US/legal/not-allowed-licenses/
> |||||||

That is fine. Don't include them. They aren't valid License tag values.

> >That doesn't mean there aren't "standards" for this. Like I said
> >upstream often has a top-level LICENSE, COPYING or README file
> >declaring the intended license. There also often is a NOTICES file
> >listing any legal notices subsumed by the intended/effective license.
> 
> This is not standard. This is habit. Very far from any possible
> automation and machine parsing.

I think it is a pretty standard convention and easy to automate.
Various source code repositories already do and show you the project's
license based on scanning those files.

Cheers,

Mark
--
_______________________________________________
legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Gnome Users]     [KDE Users]

  Powered by Linux