Re: [PATCH 1/3] test-ctype: test isascii

René Scharfe <l.s.r@xxxxxx> · Sun, 12 Feb 2023 10:48:12 +0100

Am 11.02.23 um 20:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@xxxxxx> writes:
>
>> Test the character classifier added by c2e9364a06 (cleanup: add
>> isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
>> special treatment, as our string-based tester can't find it with
>> strcmp(3).  Allow NUL to be given as the first character in a class
>> specification string.  This has the downside of no longer supporting
>> the empty string, but that's OK since we are not interested in testing
>> character classes with no members.
>
> I wonder how effective a test we can have by checking a table we use
> in production (i.e. ctype.c::sane_ctype[]) against another table we
> use only for testing (i.e. string literals in test-ctype.c), but
> that is not something new in this series.

What aspect is left uncovered?

Or do you mean that the production table should be made trivially
readable to avoid having to test at all?

I on the other hand wonder if we really should add more and more
locale-ignoring classifiers.  Parsing object headers and such sure
require that stability, but parsing commit messages and blob
payloads should perhaps better be done with locale-aware versions
with multi-byte character support.

> I do not offhand know if the string literal prefixed with NUL is
> safe against clever compilers; my gut feeling says it should
> (i.e. allowing such an "optimization" does not seem to have much
> merit), but my gut has been wrong many times in this area, so...

Some compilers do despicable things in the name of optimization, but I
don't see the basis for truncating a string literal at the first NUL.

C99 standard section 6.4.5 (String literals) paragraph 5 has a footnote
that says: "A character string literal need not be a string (see 7.1.1),
because a null character may be embedded in it by a \0 escape sequence."
and 7.1.1 defines: "A string is a contiguous sequence of characters
terminated by and including the first null character."

René