Hi, On 6/5/23 14:34, Yedidyah Bar David wrote: > Hi, > > On Mon, Jun 5, 2023 at 2:35 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote: >> >> Hi Yedidyah, >> >> On 6/5/23 13:17, Yedidyah Bar David wrote: >>> Clarify that the behavior of these functions is undefined if c is >>> neither in the unsigned char range nor EOF. >>> >>> I copied the added text from toupper.3. >>> >>> In practice, calling them on out-of-range values - tested with recent >>> glibc/gcc - is simply reading from random process memory - meaning, you >>> either get some garbage, if the memory was readable, or a segmentation >>> fault. See also: >>> >>> https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale >>> >>> Signed-off-by: Yedidyah Bar David <didi@xxxxxxxxxx> >> >> This is already covered by the NOTES section, isn't it? > > It's _mentioned_ there, correct - but not sure it's covered. You're right. That's why I've sent the patch mentioning UB. What do you think about that one? (I now see that you like it). > > It's also mentioned in toupper.3's NOTES. I'll check that page to see if it needs some simplifying. > > I think it's helpful to explicitly say that behavior is undefined in this case. Yep. > If you feel like doing this inside NOTES, one way or another, ok for me. > > Right now, NOTES says what you must do, but not what happens if you > don't do that. > > It also says that for the common case of using them on signed char, you should > explicitly cast to unsigned char, first. It also tries to explain why this is > necessary. The explanation explains why it's necessary for compliance with the > standard, but not why it's a good thing more generally - latter is not > explained, > and AFAICT from reading glibc sources, is not necessary - see e.g. this comment > from ctype.h: > > These point into arrays of 384, so they can be indexed by any `unsigned > char' value [0,255]; by EOF (-1); or by any `signed char' value > [-128,-1). ISO C requires that the ctype functions work for `unsigned > char' values and for EOF; we also support negative `signed char' values > for broken old programs. Consider what happens with character 0xFF. If char is signed, it will be interpreted as -1 (i.e., EOF). We're lucky, because 0xFF is not a meaningful char, so probably all isXXX() functions return false for it, but it's slightly different from EOF semantically. If no locales give a meaning for 0xFF, maybe the cast can be removed from ISO C. I do something different: use -funsigned-char when compiling, so char is effectively unsigned char (except that pointers do not convert automatically). > > The real reason why you should not call them on negative values other than > EOF - casted to unsigned char or not - is simply that most likely this isn't > what you meant to do. But that's not about compliance with the standard... I guess the standard was cautious to not make 0xFF a useless char. If that's not an issue, I agree, and these functions could do the conversion internally. Cheers, Alex > > In my patch, I felt like "something should be done", but when deciding what > actually should be done, I decided to simply go with what's in toupper.3, > which seems good enough for me. > > So, what do you think? > >> BTW, I'll >> rename that section to CAVEATS. > > Not sure all of its current content is a caveat. > > Best regards, > >> >> Thanks, >> Alex >> >>> --- >>> man3/isalpha.3 | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/man3/isalpha.3 b/man3/isalpha.3 >>> index 443c2aa09..8ad997c29 100644 >>> --- a/man3/isalpha.3 >>> +++ b/man3/isalpha.3 >>> @@ -145,6 +145,15 @@ is the special locale object >>> .BR duplocale (3)) >>> or is not a valid locale object handle. >>> .PP >>> +If >>> +.I c >>> +is neither an >>> +.I "unsigned char" >>> +value nor >>> +.BR EOF , >>> +the behavior of these functions >>> +is undefined. >>> +.PP >>> The list below explains the operation of the functions without >>> the "_l" suffix; >>> the functions with the "_l" suffix differ only in using the locale object >>> -- >>> 2.31.1 >>> >>> Best regards, >>> -- >>> Didi >>> >> >> -- >> <http://www.alejandro-colomar.es/> >> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 > > > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature