Re: Sphinx parallel build error: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256)

Markus Heiser <markus.heiser@xxxxxxxxxxx> · Fri, 7 May 2021 11:51:47 +0200

Am 07.05.21 um 11:14 schrieb Mauro Carvalho Chehab:
Em Fri, 7 May 2021 10:56:39 +0200
Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu:

Am 07.05.21 um 10:35 schrieb Michal Suchánek:
So the bottom line is that UTF-8 in the files will stay, and Sphinx
cannot handle UTF-8 when the locale is not UTF-8.

In the long run it might be nice to fix Sphinx to properly set the
encoding of the files it reads and writes. Or maybe there is some
parameter that specifies it?

Let's not mix things up. The Unicode-Error is not related or limited
to log nor to sphinx, it is related to the fact that we (you) try to
run a utf-8 application in an environment which is not full utf-8
functional.

No. The application itself is not UTF-8. The application input files are.

May be we have a different view on this, for me an application which
reads UTF-8 in and spids out UTF-8 is an UTF-8 application.

hint: HTML is just one Sphinx writer, there exist also other writers
e.g. LaTeX.

The big issue with the way python works with charsets is due to that:
it does a very poor job with regards to that.

This is your POV, the python developers have a different view on
handling strings.  There are epic discussions around about.

But all this discussions won't help, since we can't change the
principles of python.

Personally I think I can't ignore the principles of a language
and I'm feeling well with setting up an UTF-8 environment.

I remember that in the past I had to use this quite often
(before UTF-8 being default on the distros I was using on that time):

	LANG=C <some_python_script>

Just to avoid them to crash.

If I'm not mistaken, older Fedora/Mandrake distros had some bugs with
python-written scripts that, if the machine's language were not
English, such scripts crash, as the i18n translated messages were
on a different charset than what the python script would be expecting.

For me "i18n translated message" is a good example that I'm not
wrong with my opinions.  This is not true for all devices but
on those device you won't run an applications like Sphinx.

For the short term I think it is reasonable to run a python test script
that prints fancy unicode characters before running Sphinx and bail if
the test script fails.

To be assure, I recommend to set UTF-8 locale environment in the
Makefile.

My experience shows that this is the default with almost all
containers (images), there are only a few where this is not the
case (may be suse?).

That may not be true on certain parts of the globe.

Sorry, I have spoken about common LXC images.

I've no idea what charsets the most-used distributions in Asian
Countries use use ;-)

I guess these days most often they will use UTF-8 since ASCII
haven't helped in the past 80s ;-)

  -- Markus --