Hi Philppe, See comments on tooling considerations below: > On Jun 12, 2019, at 4:26 AM, Philippe Ombredanne <pombredanne@xxxxxxxx> wrote: > > Hi Jilayne: > > On Wed, Jun 12, 2019 at 7:25 AM J Lovejoy <opensource@xxxxxxxxxxx> wrote: >> >> GOAL: The over-arching goal here is to provide clear, concise, and >> machine-readable license information at the file-level for the Linux >> kernel by placing SPDX License List short identifiers at the top of >> each file in order to make it easier for downstream users and >> distributors to use automated processes and comply with the applicable >> license terms. >> >> NOTE: The guidance is either to REPLACE the existing license notice >> with the SPDX license identifier or ADD the SPDX license identifier. >> The rationale here is that where the license notice is clear, then >> replacing should be okay as this is essentially upgrading the current >> notice to something more modern and machine readable. But everywhere >> else, a conservative approach of adding the SPDX identifiers (and as >> such, keeping the existing license notice info) means that others can >> see both. This also avoids the need to create or retain some file with >> all the removed notices, which seems to be distasteful and untenable >> based on the threads related to that topic. The SPDX identifier still >> needs to be accurate, of course. >> >> TOOLING CONSIDERATION: To make it easier on tooling, putting some kind >> of START/END notation, as Steve has recommended, > > Having some convention to enclose a notice in some markers would have > no impact and would not make it easier for scancode: the notice would > be detected and reported if is enclosed in markers or not. This could > be leveraged later as a way to speed things up of course, but that's > minor. > > If tagging notice text boundaries is the route selected for the > kernel, then it is worth crafting something that is well thought out > as the kernel ways **will** surely be adopted by other projects. > > FWIW, here are a few examples of using such markers that exist in the > wild from a quick grep in scancode license notices database: > > - Mozilla: BEGIN LICENSE BLOCK/ END LICENSE BLOCK > - Apple: @APPLE_LICENSE_HEADER_START@ > @APPLE_LICENSE_HEADER_END@ and some variations > - Oracle: CDDL HEADER START/END , GPL HEADER START/END, LGPL HEADER > START/END used with their highly impractical "DO NOT ALTER OR REMOVE > COPYRIGHT NOTICES OR THIS FILE HEADER." > - LICENSE_START/LICENSE_END (and variations such as %%%LICENSE_START > used in some man pages and tools including the kernel) > - BSDCOPYRIGHTBEGIN, ECOSGPLCOPYRIGHTBEGIN and other variations in eCos. > - Qt and KDE: QT_BEGIN_LICENSE with variations > - COPYRIGHTBEGIN/END > - Begin-Header/End-Header > - BEGIN LICENSE TEXT/END LICENSE TEXT > >> thus allowing tooling >> to ignore what’s enclosed there and just read the SPDX identifier as >> the definitive license notice. > > There is something inconsistent here: well, it’s not inconsistent, really and it is consistent with the SPDX spec, actually… more below... > either a custom notice or > disclaimer is needed and has some legal importance > or it has none and > should be removed. but you are making the wildly optimistic assumption that such determination is black and white, it is not. Reasonable attorneys may disagree as to what is of “legal importance” or not in these cases and ultimately, we don’t decide a judge does. This is the challenge, as I implicitly note below, of the Linux kernel not having its own lawyer or one point of responsibility. We are collectively making a decision that impacts lots and lots of Linux users. If we had some text that we all generally agreed we didn’t think was substantively adding anything to the standard disclaimer text and thus, just used the GPL-2.0-[only / or-later] tag and didn’t add a new SPDX identifier to represent the non-substantive text - we’d basically be following the advice of the SPDX spec for section 4.6 License Information in File (represented by the stuff found in the file that we left there, but enclosed in some kind of denotation) and section 4.5 Concluded License (represented by the SPDX license identifier) Don’t get me wrong - the best case scenario for these kinds of things is to have the copyright holder clean it up - but just trying to come up with something for when that’s not feasible that is a bit on the conservative side and accommodates the concerns raised about full-scale removing stuff. > If it has some importance and needs to be kept, > then I cannot "just read the SPDX identifier as the definitive license > notice" as you wrote, I think I would need to consider both the id and > extra notice. Or am I missing something? yes and no. see above :) > >> As time goes by, if copyright holders >> come across these files and want to remove the original notices, then >> they have the right to do so. >> >> GUIDANCE: The following is meant to provide some high-level guidance >> for how to handle common scenarios and triage the approaches to reach >> the stated goal. >> The following is not intended to be legal advice. Rather, it is meant >> to reflect the intention of the participating individuals to improve >> the quality and machine-readability of the applicable license >> information in Linux kernel files. The approach described below has >> been developed with the Linux kernel in mind and might not be >> appropriate for other projects or communities. >> >> #1 Where a file contains the standard license notice as stated in >> the GPL-2.0 license text for GPL-2.0-only or GPL-2.0-or-later and no >> other license information whatsoever —> then REPLACE the standard >> license notice with the SPDX identifier for the relevant license. >> >> #2 Where a file contains a non-substantive variation on the standard >> GPL-2.0 license notice, but still provides clear distinction as to >> GPL-2.0-only or GPL-or-later consistent with the intent of the >> standard license notice and no other license information whatsoever >> —> then REPLACE the standard license notice with the SPDX identifier >> for the relevant license. >> >> #3 Where a file contains a license notice that is non-standard as >> compared to that stated in the GPL-2.0 license text but is nonetheless >> clear as to GPL-2.0-only or GPL-2.0-or-later and no other license >> information whatsoever —> then REPLACE the standard license notice >> with the SPDX identifier for the relevant license. >> >> NOTES RELATED TO #1-3: >> The SPDX identifier is simply a more concise way to express the same >> intention regarding what license applies to the file as the standard >> license notice, but does so in a reliably, machine-readable way that >> meets the needs of modern software supply chain use and efforts to >> automate detection of license information in order to facilitate more >> complete license information and license compliance. One consideration >> is whether replacing existing license notices with more concise, >> machine-readable expression of the same information could run afoul of >> a strict reading of GPL-2.0, section 1. >> Such a strict reading applied to the scenarios described in #1-3 is >> unconvincing for the following reasons: >> * Although the license text itself recommends the use of the standard >> license notice, it is not a hard requirement of the license. The >> definitive text, as always, is the full text of the license itself. >> Notably, the license author/steward, the Free Software Foundation >> (FSF), encourages use of the standard header, but more broadly >> recommends clear communication of the license variant chosen for the >> given work as seen in various pages on their site.[1] Furthermore, >> Richard Stallman endorsed the use of the revised SPDX identifiers for >> helping provide clarity as to whether a licensor has chosen the >> license-version-only or any-later-version option.[2] >> * This project to improve license information in the Linux kernel >> files has been discussed among kernel developers, on kernel mailing >> lists, and documented in public files and documentation beginning in >> mid-20173 to which many kernel copyright holders past and present have >> access and would be likely to see and which has received positive >> response and encouragement. >> [1] See https://www.gnu.org/licenses/gpl-howto.html which provides the >> standard license notice, but then also goes on to >> https://www.gnu.org/licenses/gpl-faq.en.html#LicenseCopyOnlysuggest >> one clear and explicit statement such as, “This program is released >> under license FOO”. FAQ questions and https://www.gnu.org/licenses >> /gpl-faq.en.html#NoticeInSourceFile also stress the general need for >> clarity without mandating use of the specific standard license notice. >> [2] See https://www.gnu.org/licenses/identify-licenses-clearly.html >> >> #4 Where the file contains a license notice that clearly states the >> file is licensed under “GPL” with no indication of version number >> and no other license information whatsoever —> ADD SPDX identifier >> for GPL-2.0-or-later >> Rationale: This is consistent with the text of the license which >> states, “If the Program does not specify a version number of this >> License, you may choose any version ever published by the Free >> Software Foundation.” Because the Linux kernel is well-known to be >> licensed under GPL-2.0-only and use of GPL-1.0 is generally sparse, it >> within the options given in the license text to choose GPL-2.0-or- >> later in this case. Doing so more easily enables use of such files >> beyond the Linux kernel. > > Just FYI, I am fine with a GPL-2.0-or-later choice for the kernel, > but scancode will report these cases as GPL-1.0-or-later. good to know, thanks. I don’t think that is an issue, agree? > >> #5 Where the file contains a license notice that: a) refers to the >> COPYING file or another specific file (or references GPL and the >> COPYING or another specific file) with no other information as to the >> specific license whatsoever; and b) the COPYING or other specific file >> can be located and is clearly a copy of GPL-2.0 —> ADD SPDX >> identifier for GPL-2.0-only >> Rationale: This is similar to #4, but the combination of a clear >> reference to a specific license file and the fact that the Linux >> kernel is clearly intended to be GPL-2.0-only leads to the intent that >> this is also GPL-2.0-only. The COPYING file currently in the kernel is >> at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ >> tree/COPYING, and refers to GPL-2.0-only. The (earlier) version of the >> COPYING file also had Linus expressing GPL-2.0-only: see https://git.k >> ernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/COPYING?i >> d=1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 >> >> #6 Where a file contains a license notice that is non-standard as >> compared to that stated in the GPL-2.0 license text but in nonetheless >> clear as to GPL-2.0-only or GPL-2.0-or-later and there is other >> license information, and that license information contains the >> following: >> #6a An existing known additional license or exception >> for which there is an SPDX identifier >> —> ADD appropriate SPDX license expression >> (use of AND, OR, WITH), where person making change is does not >> represent copyright holders for file >> —> REPLACE with appropriate SPDX license >> expression, where person(s) making or signing-off on changes represent >> copyright holders >> #6b An additional license or exception for which >> there is no SPDX identifier as per the existing SPDX License List >> Matching Guidelines: >> -- If clearly a different license and use is >> more than one or two files, then submit for addition to SPDX License >> List at http://13.57.134.254/app/submit_new_license/ >> -- If close to an existing license/exception >> on the SPDX License List such that the SPDX license’s matching >> markup might be extended to accommodate as a match, submit to SPDX >> legal team for review of such. >> -- If some mess of a license that is unclear, >> an abomination, contains non-free elements, or otherwise poses some >> kind of challenge, then attempt to contact copyright holders to change >> license with recommendation >> #6c An additional or different disclaimer or >> warranty text: >> — Where the copyright holders of the file in >> questions can be contacted, then ask them to remove this and use the >> appropriate SPDX identifier for GPL >> — Where copyright holders of the file in >> question cannot be easily contacted or found, then analyze differences >> between additional disclaimer text and standard disclaimer included in >> GPL, then: >> —> if additional disclaimer text >> adds no additional substantive aspects to the standard GPL disclaimer, >> REPLACE with appropriate SPDX license identifier for GPL-2.0 >> —> If additional disclaimer text >> adds additional substantive aspects to the standard GPL disclaimer, >> ADD the appropriate SPDX license identifier for GPL-2.0 >> ======== >> Please note: while I am a lawyer, I do not represent any kernel >> developers nor any of the people involved in this work. I understand >> that no lawyer could represent the interest of the Linux kernel and >> its many copyright holders in total. We can, however, discuss this in >> a public forum and come up with some consensus as to reasonable >> guidelines and rationale for such. >> I have tried to collect the various thoughts and opinions expressed on >> the mailing list on these topics. >> I’m particularly interested in the following feedback: >> A) This takes a somewhat conservative approach regarding retaining >> some of the license notices and adding SPDX identifiers, rather than >> replacing. I’d like to know from those involved in using scanning >> tools (Thomas, Philippe) if this would be tenable. > > Speaking for the scanning tool in use here (i.e. the scancode-toolkit) > having SPDX ids alone or with some extra notice has no impact. The > SPDX id and the license notice will be detected and each detected > texts reported with their own corresponding license expression (which > would happen to be the same and that can later be combined and > simplified in a single expression.) > > It would likely not impact checkpatch.pl either since it cares only > about the SPDX identifiers. > > BUT If you start to butcher the original notice (such as you remove > the GPL notice part and keep a warranty disclaimer) the detection > results will be butchered accordingly and that standalone disclaimer > will be eventually detected either as a bare disclaimer with no > related license or as a partial detection of an another notice (since > scancode eventually does a multidiff/red line comparison). good to know. I don’t think I intended to suggest we’d butcher up the existing notice - I think we either leave it all in, and ADD SPDX identifier or REPLACE it all. That was what I was trying to delineate here overall. I think these disclaimers ones are particular tricky. Might be worth trying to settle some of the other threshold issues raised by John and Richard (response to those next and in order!) and then come back to this. Thanks, Jilayne > The same would likely apply to other license scanners that do not use > a diff, though this could be amplified as regex-based scanners such as > Fossology may get unlucky and miss having a regex for the butchered > text and probabilistic scanners such as Licensee and many others may > see the butchered text going below their false positive threshold and > ignore it entirely. > > Therefore my advice would be either to keep a complete and consistent > notice or to keep none e.g. avoid cherry picking parts of a notice as > this will surely result in some license detection but not the one you > would expect: it will likely be inconclusive and require more review. > > -- > Cordially > Philippe Ombredanne