On 6/5/23 15:15, Alejandro Colomar wrote: > Hi, > > On 6/5/23 14:34, Yedidyah Bar David wrote: >> Hi, >> >> On Mon, Jun 5, 2023 at 2:35 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote: >>> >>> Hi Yedidyah, >>> >>> On 6/5/23 13:17, Yedidyah Bar David wrote: >>>> Clarify that the behavior of these functions is undefined if c is >>>> neither in the unsigned char range nor EOF. >>>> >>>> I copied the added text from toupper.3. >>>> >>>> In practice, calling them on out-of-range values - tested with recent >>>> glibc/gcc - is simply reading from random process memory - meaning, you >>>> either get some garbage, if the memory was readable, or a segmentation >>>> fault. See also: >>>> >>>> https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale >>>> >>>> Signed-off-by: Yedidyah Bar David <didi@xxxxxxxxxx> >>> >>> This is already covered by the NOTES section, isn't it? >> >> It's _mentioned_ there, correct - but not sure it's covered. > > You're right. That's why I've sent the patch mentioning UB. > What do you think about that one? (I now see that you like it). > >> >> It's also mentioned in toupper.3's NOTES. > > I'll check that page to see if it needs some simplifying. > >> >> I think it's helpful to explicitly say that behavior is undefined in this case. > > Yep. > >> If you feel like doing this inside NOTES, one way or another, ok for me. >> >> Right now, NOTES says what you must do, but not what happens if you >> don't do that. >> >> It also says that for the common case of using them on signed char, you should >> explicitly cast to unsigned char, first. It also tries to explain why this is >> necessary. The explanation explains why it's necessary for compliance with the >> standard, but not why it's a good thing more generally - latter is not >> explained, >> and AFAICT from reading glibc sources, is not necessary - see e.g. this comment >> from ctype.h: >> >> These point into arrays of 384, so they can be indexed by any `unsigned >> char' value [0,255]; by EOF (-1); or by any `signed char' value >> [-128,-1). ISO C requires that the ctype functions work for `unsigned >> char' values and for EOF; we also support negative `signed char' values >> for broken old programs. > > Consider what happens with character 0xFF. If char is signed, it will be > interpreted as -1 (i.e., EOF). We're lucky, because 0xFF is not a meaningful > char, so probably all isXXX() functions return false for it, but it's slightly > different from EOF semantically. If no locales give a meaning for 0xFF, maybe > the cast can be removed from ISO C. I do something different: use > -funsigned-char when compiling, so char is effectively unsigned char (except > that pointers do not convert automatically). > >> >> The real reason why you should not call them on negative values other than >> EOF - casted to unsigned char or not - is simply that most likely this isn't >> what you meant to do. But that's not about compliance with the standard... > > I guess the standard was cautious to not make 0xFF a useless char. If that's > not an issue, I agree, and these functions could do the conversion internally. To be clear, I'm talking about this: $ cat iscntrl.c #include <ctype.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> static const char str_bool[2][8] = { "false", "true" }; static inline const char * strbool(bool x) { return str_bool[!!x]; } int main(void) { signed char s = 0xFF; unsigned char u = 0xFF; printf("iscntrl(-1): %s\n", strbool(iscntrl(s))); printf("iscntrl(255): %s\n", strbool(iscntrl(u))); exit(EXIT_SUCCESS); } $ ./a.out iscntrl(-1): false iscntrl(255): false > > Cheers, > Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature