On Fri, May 07, 2021 at 11:51:47AM +0200, Markus Heiser wrote: > Am 07.05.21 um 11:14 schrieb Mauro Carvalho Chehab: > > Em Fri, 7 May 2021 10:56:39 +0200 > > Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu: > > > > > Am 07.05.21 um 10:35 schrieb Michal Suchánek: > > > > So the bottom line is that UTF-8 in the files will stay, and Sphinx > > > > cannot handle UTF-8 when the locale is not UTF-8. > > > > > > > > In the long run it might be nice to fix Sphinx to properly set the > > > > encoding of the files it reads and writes. Or maybe there is some > > > > parameter that specifies it? > > > > > > Let's not mix things up. The Unicode-Error is not related or limited > > > to log nor to sphinx, it is related to the fact that we (you) try to > > > run a utf-8 application in an environment which is not full utf-8 > > > functional. > > > > No. The application itself is not UTF-8. The application input files are. > > May be we have a different view on this, for me an application which > reads UTF-8 in and spids out UTF-8 is an UTF-8 application. > > hint: HTML is just one Sphinx writer, there exist also other writers > e.g. LaTeX. And same as the browser can display HTML documents in pretty much any character set independently of your system locale Sphinx should be able to produce those for your browser to display independent of the system locale. Same for LaTeX, PDF, or whatver else. > > The big issue with the way python works with charsets is due to that: > > it does a very poor job with regards to that. > > This is your POV, the python developers have a different view on > handling strings. There are epic discussions around about. > > But all this discussions won't help, since we can't change the > principles of python. It has nothing to do with python developer POV on handling strings or principles of python. The python support for handling strings is complete in the sense it does not depend on the system locale and can handle strings in multiple charcter sets. Sphinx as program written in python could handle documents in any encoding supported by python independent of system locale if Sphinx developers bothered to use the python encoding support correctly. Apparently they did not. > > Personally I think I can't ignore the principles of a language > and I'm feeling well with setting up an UTF-8 environment. > > > I remember that in the past I had to use this quite often > > (before UTF-8 being default on the distros I was using on that time): > > > > LANG=C <some_python_script> > > > > Just to avoid them to crash. > > > > If I'm not mistaken, older Fedora/Mandrake distros had some bugs with > > python-written scripts that, if the machine's language were not > > English, such scripts crash, as the i18n translated messages were > > on a different charset than what the python script would be expecting. > > For me "i18n translated message" is a good example that I'm not > wrong with my opinions. This is not true for all devices but > on those device you won't run an applications like Sphinx. Or it's a good example of people never testing the application for the case where explicit handling is required, and possibly one of the reasons more requirements for explicit handling of the encoding were added. In the end it merely led to changing from universal ASCII encoding to universal UTF-8 encoding with no support for running python scripts in any locale that does not use the 'universal' encoding. I think that the idea was to make scripts resilient to encoding errors and prevent data corruption by raising an exception when mishandling of encoding is detected but instead of handling the exceptions people just punted to using the same encoding all the time. Thanks Michal