Em Thu, 6 May 2021 20:06:25 +0200 Michal Suchánek <msuchanek@xxxxxxx> escreveu: > On Thu, May 06, 2021 at 07:53:25PM +0200, Markus Heiser wrote: > > Hi Mauro, > > > > it is not comfortable but is it mad? .. > > > > Most often languages (or applications) do not handle encoding > > of strings they just piping a binary stream while python > > decode / encodes strings. > > > > "The Zen of Python" [1] says > > > > Explicit is better than implicit. This was taken into an extreme with regards to charsets: "better" should never be translated to "crash" ;-) > > If a stream can't encode symbols and these symbols should be ignored > > you have to set the encoding of the stream explicit to ignore > > such symbols. > > The problem is this part never happened. Loggers are supposed to tell > you about the error in your application, not crash it. It is insane to crash the error log due to a charset issue ;-) > But the problem with Sphinx may be that the output file is also assumed > to be in the locale encoding, and the output encoding is never set. It's > HTML so it could be encoded with entities, too. > > The idea about handlinng encoding precisely is not mad in itself but then > everybody working with just ASCII and never testing their software works > in the cases where explicit handling is needed is the mad part. True. The machine's locale shouldn't affect *at all* the produced documents. See, there's a hole set of non-latin family of charsets supported on Linux: https://man7.org/linux/man-pages/man7/charsets.7.html Nothing prevents that someone using a machine whose default encoding is KOI8-R/BIG-5/GB 2312/JIS X 0208/... to use Sphinx to produce UTF-8 [1] documents. [1] or whatever other output encoding Ok, the logger may not be able to correctly display certain chars, but it it be perfectly fine and sane to use //TRANSLIT (or something similar) in order to do a charset conversion. Even to just print a <?> for all chars that aren't printable at the logger's output using the charset set by LANG/LC_* is better/saner than crashing. Thanks, Mauro