SPDX in the kernel: State of the union

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Folks!

After the initial SPDX effort which ended about three years ago there
was not really much progress neither in terms of file statistics nor in
terms of activity on this list... I'm refraining from asking the obvious
questions...

Nevertheless I'm trying to cut myself some cycles to get this rolling
again.

As a first step I tried to resurrect my old scripts. That was not really an
enjoyable experience due to the python2 -> python3 fallout and the changes
in scancode since then.

Though after quite some cursing I was able to gather at least initial
statistics and to analyze patches based on the scancode detection rules.

I surely have to say quite some words about the 'improved' scancode
detection rules too, but I sort that out with Philippe off-list.

So here is where we are:

Files without SPDX identifier:		16410	~78% of total files

Files without any license hint:	         7131   ~43% of !SPDX'ed files
Files with one license hint:		 6673   ~40% of !SPDX'ed files
Files with two license hints:            2267   ~13% of !SPDX'ed files
Files with more than two hints:           339   ~ 2% of !SPDX'ed files

Files with less than 4 lines content:

        0 length:	   33   (some can be removed)
	1 line:		  276
	2 lines:	  109
	3 lines:	  135

Files without any license hint:

        arch                 774
	block		       1
	certs		       2
	crypto		      10
	Documentation	    4266
	drivers		     320
	fs		      26
	include		     124
	init		       0
	ipc		       0
	kernel		      14
	lib		      26
	mm		       3
	net		      15
	samples		       7
	scripts		      63
	security	       8
	sound		       9
	tools		    1457
	usr		       0
	virt		       0

Files with one license hint:

        arch		    1405
	block		       0
	certs		       1
	crypto		       1
	Documentation	      65
	drivers		    4369
	fs		     126
	include		     356
	init		       0
	ipc		       1
	kernel		      18
	lib		      35
	mm		       4
	net		      69
	samples		      14
	scripts		      26
	security	       0
	sound		      40
	tools		     141
	usr		       1
	virt		       0

Files with two license hints:

        arch		     731
	block		       0
	certs		       0
	crypto		       3
	Documentation	      13
	drivers		    1114
	fs		      66
	include		     101
	init		       0
	ipc		       0
	kernel		       0
	lib		      54
	mm		       0
	net		      91
	samples		      39
	scripts		       5
	security	       1
	sound		      14
	tools		      35
	usr		       0
	virt		       0

Script-able files with reasonable effort:

	No hint:            6501 ~90% of no-hint files
	One hint:	    5129 ~76% of one-hint files
	Two hints:	     584 ~25% of two-hint files
	Total:		   12213 ~75% of !SPDX'ed file

	Remaining:          4197 ~5% of total files

Scancode rules involved:     561
Scancode rules validated:    117

My plan is to focus on the 'low hanging' fruit of reasonably easy
script-able files first.

For the files with zero hints that requires a few questions to be answered
upfront:

   1) What's the approach for files with obviously not copyright-able
      content:

      - Files which just include other file[s] (one or two lines)

      - Files which have just a more or less useful comment why they
      	are otherwise empty (one to three lines)

      - Files which just contain a #define FOO and an include of
        another file to compile the included file with some other
	functionality (two or three lines)

   2) What's the approach for machine generated files:

      - Primarily kernel configuration files

   3) What's the approach for 'hidden' dot-files like .gitignore:

      Those files are just providing information to tools. The file format
      is defined by the tool (git, clang, coccinelle....) and the creative
      content is exactly zero...

   4) What's the approch for binary blobs or other files which cannot carry
      license information in the file itself?

Which is related to the discussion in this thread:

  https://lore.kernel.org/all/20220516101901.475557433@xxxxxxxxxxxxx

The other question for these files with zero hints is which license to
chose. Sure you can argue that all files w/o any hint fall under the
project license, but especially the Documentation directory is interesting
as it's not clear for all of the various content what the preferred and
assumed license should be. That needs some thoughts and clarifications.
For the kernel code itself that's not a real question, but the tools
directory might need some care too.

For the files which have a licensing hint in whatever form, I think
resuming the work where we left off, i.e. mainly reviewing per scancode
match rules based patterns, makes a lot of sense.

Based on my cursory validation of those patterns I'm confident that we can
reach a 95% coverage within a reasonable amount of time.

Finally here is another round of important questions:

  #1 Is there still interest to get this done? The silence on this list
     after the initial effort is deafening.

  #2 Are there still enough interested and comptent people on this list to
     handle the legal questions?

  #3 Was there any progress on the outstanding questions on this list where
     discussion dried out almost 3 years ago?

I'm willing to pull the cart again, but if the interest and support stays
around zero, I surely have other things to do.

Thanks,

	Thomas



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux