Re: [PATCH v2 09/39] scripts/kernel-doc.py: add a Python parser

Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> · Wed, 26 Feb 2025 07:56:47 +0100

Em Tue, 25 Feb 2025 13:10:19 -0700
Jonathan Corbet <corbet@xxxxxxx> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> writes:
> 
> > Em Mon, 24 Feb 2025 16:38:58 -0700
> > Jonathan Corbet <corbet@xxxxxxx> escreveu:
> >  
> >> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> writes:
> >> 
> >> 
> >> I also think you should give consideration to preserving the other
> >> copyright notices in the Perl version.  A language translation doesn't
> >> remove existing copyrights...who knows how much creativity went into
> >> some of those regexes?  
> >
> > Makes sense, but the copyrights at kernel-doc.pl:
> >
> > 	## Copyright (c) 1998 Michael Zucchi, All Rights Reserved        ##
> > 	## Copyright (C) 2000, 1  Tim Waugh <twaugh@xxxxxxxxxx>          ##
> > 	## Copyright (C) 2001  Simon Huggins                             ##
> > 	## Copyright (C) 2005-2012  Randy Dunlap                         ##
> > 	## Copyright (C) 2012  Dan Luedtke                               ##
> > 	##                                                               ##
> > 	## #define enhancements by Armin Kuster <akuster@xxxxxxxxxx>     ##
> > 	## Copyright (c) 2000 MontaVista Software, Inc.                  ##
> > 	#
> > 	# Copyright (C) 2022 Tomasz Warniełło (POD)
> >
> > Also doesn't preserve all copyrights from people that worked hard to
> > maintain it all over those years.  
> 
> Agreed ... and I'm not sure what we can do about that.  But *removing*
> existing copyright notices is a bit of a different story; that is
> generally considered to be fairly bad form.

I'm with you: we shall not remove copyrights.

Yet, copyrights were originally developed for artwork (paintings, music
and such). So I guess we can borrow an analogy from there to try 
understanding what a conversion like that would mean. At least for me,
it sounds like having two paintings of the same image: they both
reflect the same picture, but they have different brush strokes. They
also may have different painting styles that may look similar but are
different.

Using such analogy, let's say someone draws a new painting while looking
at a famous painting like Monalisa. Surely the painter should give credits
to Leonardo Da Vinci  for his brilliant artwork, but, on the other hand,
he cannot and should not sign that his painting was authored by Leonardo
Da Vinci. 

This is the same here: the Python code, while derivated from the
Perl version, doesn't have the same coding style ("brush strokes") nor
we can say that it were authored by the original writers. IMO, all we
can do is to give credits for the original authors and preserve GPLv2
license, which explicitly allows derivative work. 

That's why I think we could give such credits with some preamble
note to distinguish it from the Python copyrights. 

It could be something like:

	# Converted from the kernel-doc script originally written in Perl
	# under GPLv2, copyrighted since 1998 by the following authors:

Followed by a list of the contributors, or it could be mentioning the
original script and how people could browse to see the developers
who wrote/modified kernel-doc.

Feel free to suggest a better text if you think the above won't fit.

> I don't have a problem with adding a longer credits area, I guess, if we
> want to do that (though it's not normal for other source files).  But
> I'm not sure we need to.

I have the same doubts, but on the other hand, looking at the
copyrights written on kernel-doc.pl since 2005 (git version), I can see 
records for just 3 persons:

- Dan Luedtke: a single patch adding html5 support
  1b40c1944db4 ("scripts/kernel-doc: added support for html5")

  We didn't port html5 to Python - and html output was already removed
  from kernel-doc a long time ago. Maybe there might have some small
  pieces of his original work that could have been ported. I dunno.

- Tomasz Warniełło: basically, changes at the help/man part of the script

  2b306ecaf57b scripts: kernel-doc: Refresh the copyright lines
  258092a89085 scripts: kernel-doc: Drop obsolete comments
  252b47da9fd9 scripts: kernel-doc: Replace the usage function
  834cf6b9039e scripts: kernel-doc: Translate the "Other parameters" subsection of OPTIONS
  c15de5a19a28 scripts: kernel-doc: Translate the "Output selection modifiers" subsection of OPTIONS
  9c77f108f43a scripts: kernel-doc: Translate the "Output selection" subsection of OPTIONS
  dd803b04b0a0 scripts: kernel-doc: Translate the "Output format selection modifier" subsection of OPTIONS
  2875f7870821 scripts: kernel-doc: Translate the "Output format selection" subsection of OPTIONS
  f1583922bf93 scripts: kernel-doc: Translate the DESCRIPTION section
  43caf1a6823d scripts: kernel-doc: Relink argument parsing error handling to pod2usage
  a5cdaea525c3 scripts: kernel-doc: Add the basic POD sections

  Parts of the text used at the POD sections were preserved at the 
  Python version. I didn't check if the texts we're using were
  authored by him.

- Randy Dunlap: 64 patches fixing things and improving the script

  I'm pretty sure I ported lots of stuff from Randy to the Python
  version.

At least for me, while it sounds right to give credits for the above
3 developers and also for Michael, Simon and Armin, who collaborated
and authored it before git time, it doesn't sound right to not mention 
any but one of the several developers that have been maintaining it 
since 2005. Now, the list, ordered by the number of patches is:

     65 Randy Dunlap
     57 Mauro Carvalho Chehab
     32 Jani Nikula
     20 Jonathan Corbet
     11 Tomasz Warniełło
     11 Johannes Berg
      7 Kees Cook
      6 Vegard Nossum
      6 Aditya Srivastava
      5 Paolo Bonzini
      5 Martin Waitz
      4 Matthew Wilcox
      4 Daniel Vetter
      3 Mike Rapoport
      3 Danilo Cesar Lemes de Paula
      3 Daniel Santos
      3 Conchúr Navid
      3 Borislav Petkov
      3 Ben Hutchings
      3 Andy Shevchenko
      3 André Almeida
      3 Akira Yokosawa
      2 Yujie Liu
      2 Yacine Belkadi
      2 Sakari Ailus
      2 Pavel Pisa
      2 Pavan Kumar Linga
      2 Markus Heiser
      2 Jonathan Neuschäfer
      2 Jason Baron
      2 Jakub Kicinski
      2 Ilya Dryomov
      1 Will Deacon
      1 valdis.kletnieks@xxxxxx
      1 Utkarsh Tripathi
      1 Silvio Fricke
      1 Rolf Eike Beer
      1 Rich Walker
      1 Richard Kennedy
      1 Randy.Dunlap
      1 Pierre-Louis Bossart
      1 Peter Maydell
      1 Nishanth Menon
      1 Niklas Söderlund
      1 Michal Wajdeczko
      1 Masahiro Yamada
      1 Mark Rutland
      1 Lucas De Marchi
      1 Linus Torvalds
      1 Levin, Alexander (Sasha Levin)
      1 Laurent Pinchart
      1 Kamil Rytarowski
      1 Jonathan Cameron
      1 Johannes Weiner
      1 Jérémy Bobbio
      1 Jason Gunthorpe
      1 Horia Geanta
      1 Harvey Harrison
      1 Greg Kroah-Hartman
      1 Gabriel Krisman Bertazi
      1 Donald Hunter
      1 Dan Luedtke
      1 Coco Li
      1 Chen-Yu Tsai
      1 Bart Van Assche
      1 Anna-Maria Behnsen
      1 Alexander Lobakin
      1 Alexander A. Klimov

If you think the list is too long, one option would be to draw a line
(for instance picking developers with more than 2 patches or something
like that) and add an "and others" to not forget about the others.

We might analyze each individual contribution to see what was relevant
or not, ignoring for instance single-line authors that did changes like
this one:

	diff --git a/scripts/kernel-doc b/scripts/kernel-doc
	index 28b761567815..f565536a2bef 100755
	--- a/scripts/kernel-doc
	+++ b/scripts/kernel-doc
	@@ -2082 +2081,0 @@ sub dump_function($$) {
	-    $prototype =~ s/__devinit +//;

which almost certainly doesn't affect copyrights, which doesn't add any new
code to it, while preserving credits for single-patch authors that did 
regex changes like this one:

	diff --git a/scripts/kernel-doc b/scripts/kernel-doc
	index 3982d47048a7..724528f4b7d6 100755
	--- a/scripts/kernel-doc
	+++ b/scripts/kernel-doc
	@@ -1086 +1086 @@ sub dump_struct($$) {
	-    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
	+    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
	@@ -1101,0 +1102 @@ sub dump_struct($$) {
	+       $members =~ s/\s*____cacheline_aligned/ /gos;

but for me it sounds a waste of our time to analyze all patches, and
we may risk of get things wrong, so I prefer to place the complete list.

Thanks,
Mauro