Re: [PATCH] isalpha.3: behavior is undefined if c is out-of-range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Jun 5, 2023 at 2:35 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote:
>
> Hi Yedidyah,
>
> On 6/5/23 13:17, Yedidyah Bar David wrote:
> > Clarify that the behavior of these functions is undefined if c is
> > neither in the unsigned char range nor EOF.
> >
> > I copied the added text from toupper.3.
> >
> > In practice, calling them on out-of-range values - tested with recent
> > glibc/gcc - is simply reading from random process memory - meaning, you
> > either get some garbage, if the memory was readable, or a segmentation
> > fault. See also:
> >
> > https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale
> >
> > Signed-off-by: Yedidyah Bar David <didi@xxxxxxxxxx>
>
> This is already covered by the NOTES section, isn't it?

It's _mentioned_ there, correct - but not sure it's covered.

It's also mentioned in toupper.3's NOTES.

I think it's helpful to explicitly say that behavior is undefined in this case.
If you feel like doing this inside NOTES, one way or another, ok for me.

Right now, NOTES says what you must do, but not what happens if you
don't do that.

It also says that for the common case of using them on signed char, you should
explicitly cast to unsigned char, first. It also tries to explain why this is
necessary. The explanation explains why it's necessary for compliance with the
standard, but not why it's a good thing more generally - latter is not
explained,
and AFAICT from reading glibc sources, is not necessary - see e.g. this comment
from ctype.h:

   These point into arrays of 384, so they can be indexed by any `unsigned
   char' value [0,255]; by EOF (-1); or by any `signed char' value
   [-128,-1).  ISO C requires that the ctype functions work for `unsigned
   char' values and for EOF; we also support negative `signed char' values
   for broken old programs.

The real reason why you should not call them on negative values other than
EOF - casted to unsigned char or not - is simply that most likely this isn't
what you meant to do. But that's not about compliance with the standard...

In my patch, I felt like "something should be done", but when deciding what
actually should be done, I decided to simply go with what's in toupper.3,
which seems good enough for me.

So, what do you think?

>  BTW, I'll
> rename that section to CAVEATS.

Not sure all of its current content is a caveat.

Best regards,

>
> Thanks,
> Alex
>
> > ---
> >  man3/isalpha.3 | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/man3/isalpha.3 b/man3/isalpha.3
> > index 443c2aa09..8ad997c29 100644
> > --- a/man3/isalpha.3
> > +++ b/man3/isalpha.3
> > @@ -145,6 +145,15 @@ is the special locale object
> >  .BR duplocale (3))
> >  or is not a valid locale object handle.
> >  .PP
> > +If
> > +.I c
> > +is neither an
> > +.I "unsigned char"
> > +value nor
> > +.BR EOF ,
> > +the behavior of these functions
> > +is undefined.
> > +.PP
> >  The list below explains the operation of the functions without
> >  the "_l" suffix;
> >  the functions with the "_l" suffix differ only in using the locale object
> > --
> > 2.31.1
> >
> > Best regards,
> > --
> > Didi
> >
>
> --
> <http://www.alejandro-colomar.es/>
> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



-- 
Didi





[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux