Hi, On Mon, Jun 5, 2023 at 2:35 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote: > > Hi Yedidyah, > > On 6/5/23 13:17, Yedidyah Bar David wrote: > > Clarify that the behavior of these functions is undefined if c is > > neither in the unsigned char range nor EOF. > > > > I copied the added text from toupper.3. > > > > In practice, calling them on out-of-range values - tested with recent > > glibc/gcc - is simply reading from random process memory - meaning, you > > either get some garbage, if the memory was readable, or a segmentation > > fault. See also: > > > > https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale > > > > Signed-off-by: Yedidyah Bar David <didi@xxxxxxxxxx> > > This is already covered by the NOTES section, isn't it? It's _mentioned_ there, correct - but not sure it's covered. It's also mentioned in toupper.3's NOTES. I think it's helpful to explicitly say that behavior is undefined in this case. If you feel like doing this inside NOTES, one way or another, ok for me. Right now, NOTES says what you must do, but not what happens if you don't do that. It also says that for the common case of using them on signed char, you should explicitly cast to unsigned char, first. It also tries to explain why this is necessary. The explanation explains why it's necessary for compliance with the standard, but not why it's a good thing more generally - latter is not explained, and AFAICT from reading glibc sources, is not necessary - see e.g. this comment from ctype.h: These point into arrays of 384, so they can be indexed by any `unsigned char' value [0,255]; by EOF (-1); or by any `signed char' value [-128,-1). ISO C requires that the ctype functions work for `unsigned char' values and for EOF; we also support negative `signed char' values for broken old programs. The real reason why you should not call them on negative values other than EOF - casted to unsigned char or not - is simply that most likely this isn't what you meant to do. But that's not about compliance with the standard... In my patch, I felt like "something should be done", but when deciding what actually should be done, I decided to simply go with what's in toupper.3, which seems good enough for me. So, what do you think? > BTW, I'll > rename that section to CAVEATS. Not sure all of its current content is a caveat. Best regards, > > Thanks, > Alex > > > --- > > man3/isalpha.3 | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/man3/isalpha.3 b/man3/isalpha.3 > > index 443c2aa09..8ad997c29 100644 > > --- a/man3/isalpha.3 > > +++ b/man3/isalpha.3 > > @@ -145,6 +145,15 @@ is the special locale object > > .BR duplocale (3)) > > or is not a valid locale object handle. > > .PP > > +If > > +.I c > > +is neither an > > +.I "unsigned char" > > +value nor > > +.BR EOF , > > +the behavior of these functions > > +is undefined. > > +.PP > > The list below explains the operation of the functions without > > the "_l" suffix; > > the functions with the "_l" suffix differ only in using the locale object > > -- > > 2.31.1 > > > > Best regards, > > -- > > Didi > > > > -- > <http://www.alejandro-colomar.es/> > GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 -- Didi