Re: [PATCH 0/6] Address issues with SPDX requirements and PEP-263

Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> · Sat, 7 Sep 2019 13:22:59 -0300

Em Sat, 7 Sep 2019 16:36:36 +0200
Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu:

> Am 07.09.19 um 15:34 schrieb Jonathan Corbet:
> > On Thu,  5 Sep 2019 16:57:47 -0300
> > Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> wrote:
> >   
> >> The  description at Documentation/process/license-rules.rst is very strict
> >> with regards to the position where the SPDX tags should be.
> >>
> >> In the past several developers and maintainers interpreted it on a
> >> more permissive way, placing the SPDX header between lines 1 to 15,
> >> with are the ones which the  scripts/spdxcheck.py script verifies.
> >>
> >> However, recently, devs are becoming more strict about such
> >> requirement and want it to strictly follow the rule, with states that
> >> the SPDX rule should be at the first line ever on most files, and
> >> at the second line for scripts.
> >>
> >> Well, for Python script, such requirement causes violation to PEP-263,
> >> making regressions on scripts that contain encoding lines, as PEP-263
> >> also states about the same.
> >>
> >> This series addresses it.  
> > 
> > So I really don't want to be overly difficult here, but I would like to
> > approach this from yet another angle...
> >   
> >> Patches 1 to 3 fix some Python scripts that violates PEP-263;  
> > 
> > I just checked all of those scripts, and they are all just plain ASCII.
> > So it really doesn't matter whether the environment defaults to UTF-8 or
> > ASCII here.  So, in other words, we really shouldn't need to define the
> > encoding at all.

I'm not a python expert, but, from what I researched, and from what I 
understood from Markus, if a script tries to print an UTF-8 but the
system's encoding is ASCII (or some other encoding), the python script
will crash.

At least on media, we define that some Kernel strings can be UTF-8. 
See, for example the model field at the media_entity struct:

	https://linuxtv.org/downloads/v4l-dvb-apis/kapi/mc-core.html

As stated there:

	"media_entity.model must be filled with the device model name as
	 a NUL-terminated UTF-8 string. The device/model revision must
	 not be stored in this field."

I've no idea if the two perf scripts that contain the encoding data are
meant to print some strings that may be UTF-8 encoding (like those that
we have at the media subsystem), or if it is just that whomever added
were using e-macs and wanted to make his life simpler. As it is better
to be safe then sorry, on patches 2 and 3, I'm assuming the first case.

In any case, we do need the encoding line at Sphinx extensions, 
although there, the shebang line is optional.

In other words, we have those alternatives:

1) Neither shebang nor coding -> SPDX will be at first line;
2) shebang + SPDX -> SPDX will be at the second line;
3) shebang + coding + SPDX -> SPDX will be at the third line;
4) coding + SPDX

   This is something that only makes sense for Sphinx extensions.

   IMHO, I would place SPDX at the second line too, but I *guess* Python
   may accept it at the first line and would still properly evaluate
   coding (as this technically satisfies the text at PEP-263).

> >   
> 
> Thats what I mean [1] .. lets patch the description in the license-rules.rst::
> 
> - first line for the OS (shebang)
> - second line for environment (python-encoding, editor-mode, ...)
> - third and more lines for application (SPDX use) ..
> 
> [1] https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg33240.html
> 
> -- Markus --
> 
> > This suggests to me that we're adding a bunch of complications that we
> > don't necessarily need.  What am I missing here?
> > 
> > Educate me properly and I'll not try to stand in the way of all this...
> > 
> > Thanks,
> > 
> > jon
> >   

Thanks,
Mauro