On Thu, May 06, 2021 at 07:04:44PM +0200, Markus Heiser wrote: > Am 06.05.21 um 18:46 schrieb Mauro Carvalho Chehab: > > Em Thu, 6 May 2021 17:57:15 +0200 > > Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu: > > > > > Am 06.05.21 um 12:39 schrieb Michal Suchánek: > > > > When building HTML documentation I get this output: > > > ... > > > > [ 412s] UnicodeEncodeError: 'latin-1' codec can't encode characters in position 18-20: ordinal not in range(256) > > > ... > > > > > > > > It does not say which input file contains the offending character so I can't tell which file is broken. > > > > > > > > Any idea how to debug? > > > > > > I guess the build host is a very simple container, what does > > > > > > echo $LC_ALL > > > echo $LANG It's actually set to en_US just before the build. > > > > > > prompt? If it is latin, change it to something using utf-8 (I recommend > > > 'en_US.utf8'). > > > > > > A UnicodeEncodeError can occour everywhere where characters are > > > encoded from (internal) unicode to the encoding of the stream. > > > > > > By example: > > > > > > A print or log statement which streams to stdout needs to encode > > > from unicode to stdout's encoding. If there is one unicode symbol > > > which can not encoded to stream's encoding a UnicodeEncodeError > > > is raised. > > > > Hi Markus, > > > > It shouldn't matter the builder's locale when building the Kernel > > documentation (or any other documents built from other git trees > > on other open source projects), as the Kernel's *.rpm document charset > > won't change, no matter on what part of the globe it was built. > > > > I vaguely remember about a change we made a couple of years ago > > in order to address this issue. > > Hi Mauro :) > > sure? .. what if the logger wants to log some symbols from the > chines translated parts to stdout and the encoding of stdout is > latin? [ 127s] + cd linux-5.12-next-20210506 [ 127s] + export LANG=en_US [ 127s] + LANG=en_US [ 127s] + mkdir -p html [ 127s] + python3 -c 'print("↑ᛏ个")' [ 127s] ↑ᛏ个 [ 127s] + echo 'print("↑ᛏ个")' [ 127s] + python3 test.py [ 127s] Traceback (most recent call last): [ 127s] File "test.py", line 1, in <module> [ 127s] print("\u2191\u16cf\u4e2a\uf8f9") [ 127s] UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) It certainly does not look like python can print unicode in this environment. It tells me where the problem is, though. Thanks Michal [ 127s] + : [ 127s] + locale [ 128s] LANG=en_US [ 128s] LC_CTYPE="en_US" [ 128s] LC_NUMERIC="en_US" [ 128s] LC_TIME="en_US" [ 128s] LC_COLLATE="en_US" [ 128s] LC_MONETARY="en_US" [ 128s] LC_MESSAGES="en_US" [ 128s] LC_PAPER="en_US" [ 128s] LC_NAME="en_US" [ 128s] LC_ADDRESS="en_US" [ 128s] LC_TELEPHONE="en_US" [ 128s] LC_MEASUREMENT="en_US" [ 128s] LC_IDENTIFICATION="en_US" [ 128s] LC_ALL= [ 128s] + echo LC_ALL= [ 128s] LC_ALL= [ 128s] + echo LANG=en_US [ 128s] LANG=en_US