[PATCH 2/6] tests: colcrt: fix reliance on EILSEQ in POSIX locale

Patrick Steinhardt <ps@xxxxxx> · Fri, 23 Aug 2019 12:16:59 +0200

The input file "crash1" in the colcrt/regressions test contains the
illegal byte sequence "\x94\x7e". While "\x7e" is '~', "\x94" is not a
valid character. Thus, the test assumes that getwc(3P) will return
`WEOF` and set `errno=EILSEQ`, causing colcrt(1) to abort reading the
stream and thus not print the trailing '~'.

This assumption holds just fine for glibc as it will dutifully report
EILSEQ, but musl libc will happily read the complete stream without
complaining about the illegal character. But in fact, as tests run with
LC_ALL=POSIX by default, glibc's behaviour is wrong while musl is right.
Quoting mbrtowc(3P) from POSIX.1-2017:

    [EILSEQ] An invalid character sequence is detected. In the POSIX locale an
             [EILSEQ] error cannot occur since all byte values are valid
             characters.

Fix the issue by running the colcrt tests with C.UTF8 locale.

Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
 tests/ts/colcrt/regressions | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ts/colcrt/regressions b/tests/ts/colcrt/regressions
index 394c4e823..2adeea3f3 100755
--- a/tests/ts/colcrt/regressions
+++ b/tests/ts/colcrt/regressions
@@ -24,7 +24,7 @@ ts_check_prog "timeout"
 
 check_input_file() {
 	ts_init_subtest ${1##*/}
-	timeout 2 $TS_CMD_COLCRT < $1 > $TS_OUTPUT 2>&1
+	LC_ALL=C.UTF-8 timeout 2 $TS_CMD_COLCRT < $1 > $TS_OUTPUT 2>&1
 	echo "return value: $?" >> $TS_OUTPUT
 	ts_finalize_subtest
 }
-- 
2.23.0