I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now. <https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis> starts with this: | No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files. This is contradictory. I think there are two aspects here: * Determine possible licenses that end up in the binary package. * Perform algebraic simplifications on the license list. Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system. Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses. * Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs. * No algebraic simplifications on License: lines are performed. * All forms of dynamic linking are ignored for License: tags. This covers ELF (e.g., C, C++), but also Python, Java, and other languages with late binding. * C/C++ header file contents is ignored for License: tags, regardless of header file complexity (e.g., substantial code in templates or inline functions is not treated specially). * Statically linked GCC and glibc startup code is ignored and does not show up in License: lines. The license of glibc startup code isn't even in SPDX yet, so it's not just Fedora who is ignoring this. * Statically linked libgcc support code is ignored (e.g., outline atomics on aarch64, FMV support code on x86-64). This code comes with the compiler, but is compiled from C sources that ship with the compiler. These items overlap with the startup code, but licensing could theoretically be different. * Some shared objects come with statically linked support code. I doubt that many package maintainers are aware of that, so they effectively ignore the licensing impact of that. It's structurally similar to inline functions and templates in header files. * Output from source code generations such as autoconf, bison and flex is often (but not always) ignored, in some cases even if the generated code ships in the source RPM and is compiled as-is, without regeneration. (autoconf can generate more than just build scripts.) * Licenses of crate build-dependencies end up in License: tags of RPM packages. This is a form of static linking analysis for which we have tooling, and it is mandated by the guidelines. It only covers the Rust part, other gaps for filling out License: are still there. (I don't know if the generated License: tags are accurate for individual subpackages; it seems unlikely.) Go might have something similar. * Sometimes we ignore upstream SPDX identifiers if we believe them to be incorrect, but that approach is not consistent, as far as I know. * Apparently, there seems to be some confusion whether AND or OR is the right separator for SPDX tags in License: lines. * Some package maintainers, when translating to SPDX, merely translate the existing License: line as best as they can, without looking at the actual sources or produced binaries. I looked around a bit and there are no documented product requirements internally, so I don't think we can justify investing in tooling or training to improve data quality. (I'll keep digging, though.) In the light of this, I would like to suggest updating the guidelines in the following way: The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly. To me, that seems to be much more manageable. Thoughts? Thanks, Florian _______________________________________________ legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue