Effective license analysis: required or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think Richard said that he would start a thread like this, but it
hasn't happened, so I feel like should get this off my chest now.

<https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis>
starts with this:

| No “effective license” analysis
|
| The License: field is meant to provide a simple enumeration of the
| licenses found in the source code that are reflected in the binary
| package. No further analysis should be done regarding what the
| "effective" license is, such as analysis based on theories of GPL
| interpretation or license compatibility or suppositions that
| “top-level” license files somehow negate different licenses appearing
| on individual source files.

This is contradictory.  I think there are two aspects here:

* Determine possible licenses that end up in the binary package.

* Perform algebraic simplifications on the license list.

Both analyses are forms of effective licensing analysis.  Of course, you
cannot derive an SPDX identifier without doing any analysis.  However, I
strongly believe that the first approach (determining the binary package
license) is itself a form of effective licensing analysis, and similar
reasons for package maintainers not doing this applies.  The derived
SPDX identifier will reflect both the package source code and what went
into the build system.

Below, I'm collecting a list of observations of what I believe is the
current approach in this area, as taken by package maintainers carrying
out the SPDX conversion.  To me, it strongly suggest that the SPDX
identifiers we derive today do not accurately reflect binary RPM package
licensing, even when lots of package maintainers put in the extra effort
to determine binary package licenses.

* Most package maintainers probably assume that License: tags on all
  built RPMs (source RPMs and binary RPMs) should reflect binary package
  contents, at least when all subpackages are considered in aggregate.
  Often, Source RPMs contain the same License: line as binary RPMs.

* No algebraic simplifications on License: lines are performed.

* All forms of dynamic linking are ignored for License: tags.  This
  covers ELF (e.g., C, C++), but also Python, Java, and other languages
  with late binding.

* C/C++ header file contents is ignored for License: tags, regardless of
  header file complexity (e.g., substantial code in templates or inline
  functions is not treated specially).

* Statically linked GCC and glibc startup code is ignored and does not
  show up in License: lines.  The license of glibc startup code isn't
  even in SPDX yet, so it's not just Fedora who is ignoring this.
  
* Statically linked libgcc support code is ignored (e.g., outline
  atomics on aarch64, FMV support code on x86-64).  This code comes with
  the compiler, but is compiled from C sources that ship with the
  compiler.  These items overlap with the startup code, but licensing
  could theoretically be different.

* Some shared objects come with statically linked support code.  I doubt
  that many package maintainers are aware of that, so they effectively
  ignore the licensing impact of that.  It's structurally similar to
  inline functions and templates in header files.

* Output from source code generations such as autoconf, bison and flex
  is often (but not always) ignored, in some cases even if the generated
  code ships in the source RPM and is compiled as-is, without
  regeneration.  (autoconf can generate more than just build scripts.)

* Licenses of crate build-dependencies end up in License: tags of RPM
  packages.  This is a form of static linking analysis for which we have
  tooling, and it is mandated by the guidelines.  It only covers the
  Rust part, other gaps for filling out License: are still there.  (I
  don't know if the generated License: tags are accurate for individual
  subpackages; it seems unlikely.)  Go might have something similar.

* Sometimes we ignore upstream SPDX identifiers if we believe them to be
  incorrect, but that approach is not consistent, as far as I know.

* Apparently, there seems to be some confusion whether AND or OR is the
  right separator for SPDX tags in License: lines.

* Some package maintainers, when translating to SPDX, merely translate
  the existing License: line as best as they can, without looking at the
  actual sources or produced binaries.

I looked around a bit and there are no documented product requirements
internally, so I don't think we can justify investing in tooling or
training to improve data quality.  (I'll keep digging, though.)

In the light of this, I would like to suggest updating the guidelines in
the following way:

  The License: line should be based on the sources only.  Using a tool
  such as Fossology to discover relevant licenses and their SPDX tags is
  sufficient.  No analysis how licenses from package source code or the
  build environment propagate into binary RPMs should be performed.
  Individual SPDX identifiers that a tool has listed should be separated
  by AND.  Package maintainers are encouraged to re-run license analysis
  tooling on the source code as part of major package rebases, and
  update the License: tag accordingly.

To me, that seems to be much more manageable.

Thoughts?

Thanks,
Florian
_______________________________________________
legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Gnome Users]     [KDE Users]

  Powered by Linux