Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)

Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> · Fri, 7 May 2021 11:14:51 +0200

Em Fri, 7 May 2021 10:56:39 +0200
Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu:

> Am 07.05.21 um 10:35 schrieb Michal Suchánek:
> > So the bottom line is that UTF-8 in the files will stay, and Sphinx
> > cannot handle UTF-8 when the locale is not UTF-8.
> > 
> > In the long run it might be nice to fix Sphinx to properly set the
> > encoding of the files it reads and writes. Or maybe there is some
> > parameter that specifies it?  
> 
> Let's not mix things up. The Unicode-Error is not related or limited
> to log nor to sphinx, it is related to the fact that we (you) try to
> run a utf-8 application in an environment which is not full utf-8
> functional.

No. The application itself is not UTF-8. The application input files are.

The big issue with the way python works with charsets is due to that:
it does a very poor job with regards to that.

I remember that in the past I had to use this quite often
(before UTF-8 being default on the distros I was using on that time):

	LANG=C <some_python_script>

Just to avoid them to crash.

If I'm not mistaken, older Fedora/Mandrake distros had some bugs with
python-written scripts that, if the machine's language were not
English, such scripts crash, as the i18n translated messages were
on a different charset than what the python script would be expecting.

> > For the short term I think it is reasonable to run a python test script
> > that prints fancy unicode characters before running Sphinx and bail if
> > the test script fails.  
> 
> To be assure, I recommend to set UTF-8 locale environment in the
> Makefile.
> 
> My experience shows that this is the default with almost all
> containers (images), there are only a few where this is not the
> case (may be suse?).

That may not be true on certain parts of the globe.

I've no idea what charsets the most-used distributions in Asian
Countries use use ;-)

Thanks,
Mauro