Re: [PATCH] isalpha.3: behavior is undefined if c is out-of-range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/5/23 15:15, Alejandro Colomar wrote:
> Hi,
> 
> On 6/5/23 14:34, Yedidyah Bar David wrote:
>> Hi,
>>
>> On Mon, Jun 5, 2023 at 2:35 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote:
>>>
>>> Hi Yedidyah,
>>>
>>> On 6/5/23 13:17, Yedidyah Bar David wrote:
>>>> Clarify that the behavior of these functions is undefined if c is
>>>> neither in the unsigned char range nor EOF.
>>>>
>>>> I copied the added text from toupper.3.
>>>>
>>>> In practice, calling them on out-of-range values - tested with recent
>>>> glibc/gcc - is simply reading from random process memory - meaning, you
>>>> either get some garbage, if the memory was readable, or a segmentation
>>>> fault. See also:
>>>>
>>>> https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale
>>>>
>>>> Signed-off-by: Yedidyah Bar David <didi@xxxxxxxxxx>
>>>
>>> This is already covered by the NOTES section, isn't it?
>>
>> It's _mentioned_ there, correct - but not sure it's covered.
> 
> You're right.  That's why I've sent the patch mentioning UB.
> What do you think about that one?  (I now see that you like it).
> 
>>
>> It's also mentioned in toupper.3's NOTES.
> 
> I'll check that page to see if it needs some simplifying.
> 
>>
>> I think it's helpful to explicitly say that behavior is undefined in this case.
> 
> Yep.
> 
>> If you feel like doing this inside NOTES, one way or another, ok for me.
>>
>> Right now, NOTES says what you must do, but not what happens if you
>> don't do that.
>>
>> It also says that for the common case of using them on signed char, you should
>> explicitly cast to unsigned char, first. It also tries to explain why this is
>> necessary. The explanation explains why it's necessary for compliance with the
>> standard, but not why it's a good thing more generally - latter is not
>> explained,
>> and AFAICT from reading glibc sources, is not necessary - see e.g. this comment
>> from ctype.h:
>>
>>    These point into arrays of 384, so they can be indexed by any `unsigned
>>    char' value [0,255]; by EOF (-1); or by any `signed char' value
>>    [-128,-1).  ISO C requires that the ctype functions work for `unsigned
>>    char' values and for EOF; we also support negative `signed char' values
>>    for broken old programs.
> 
> Consider what happens with character 0xFF.  If char is signed, it will be
> interpreted as -1 (i.e., EOF).  We're lucky, because 0xFF is not a meaningful
> char, so probably all isXXX() functions return false for it, but it's slightly
> different from EOF semantically.  If no locales give a meaning for 0xFF, maybe
> the cast can be removed from ISO C.  I do something different: use
> -funsigned-char when compiling, so char is effectively unsigned char (except
> that pointers do not convert automatically).
> 
>>
>> The real reason why you should not call them on negative values other than
>> EOF - casted to unsigned char or not - is simply that most likely this isn't
>> what you meant to do. But that's not about compliance with the standard...
> 
> I guess the standard was cautious to not make 0xFF a useless char.  If that's
> not an issue, I agree, and these functions could do the conversion internally.

To be clear, I'm talking about this:


$ cat iscntrl.c 
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

static const char str_bool[2][8] = { "false", "true" };

static inline const char *
strbool(bool x)
{
	return str_bool[!!x];
}

int
main(void)
{
	signed char    s = 0xFF;
	unsigned char  u = 0xFF;

	printf("iscntrl(-1):  %s\n", strbool(iscntrl(s)));
	printf("iscntrl(255): %s\n", strbool(iscntrl(u)));

	exit(EXIT_SUCCESS);
}


$ ./a.out 
iscntrl(-1):  false
iscntrl(255): false


> 
> Cheers,
> Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux