Re: [PATCH] sscanf.3: Remove term 'deprecated', and expand BUGS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote:
> On Wed, Dec 6, 2023 at 11:36 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote:
> > Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3)
> > isn't easy to use portably (since POSIX allows EINVAL on no conversion,
> > how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?).
> 
> I feel like this is rather overstating the difficulty. In practice, the
> no-conversion condition is very commonly detected by checking whether
> *endptr == nptr after the call. The usual idiom I see is something like:
> 
>     char *end;
>     errno = 0;
>     value = strtol(ptr, &end, 10);
>     if (end == ptr || *end != '\0' || errno == ERANGE)

That test could trigger UB, if you passed an unsupported base.  Of
course, in this case you pass 10, but what if the base was a
user-controlled variable?  In such a case, nothing says what happens to
'end' (experimentally, I see it is not modified, so it would be left
uninitialized); so dereferencing it, or even comparing it, would be UB.

>         goto err;

Yeah, if you just don't care and want to handle all errors in the same
way, and you know the base is supported, this is correct.

But what happens when you want to differentiate the different errors?
Let's list the possible errors, as per strtoi(3bsd):

ERRORS
     [ECANCELED]        The string did not contain any characters that
                        were converted.

     [EINVAL]           The base is not between 2 and 36 and does  not
                        contain the special value 0.

     [ENOTSUP]          The  string  contained  non‐numeric characters
                        that did not get  converted.   In  this  case,
                        endptr points to the first unconverted charac‐
                        ter.

     [ERANGE]           The  given  string was out of range; the value
                        converted has been clamped; or the range given
                        was invalid, i.e.  lo > hi.

Let's see how strtol(3) handles these:

ECANCELED:
strtol(1) has `end == ptr`.  But POSIX allows EINVAL.  But make sure you
pass a supported base.

EINVAL:
strtol(1) has EINVAL.  But what happens to end?  It could be left
unmodified (current glibc behavior); or could be set to ptr, since none
of the string has been read.  If the former, it's easy to trigger UB.
If the latter, it is indistinguishable from ECANCELED.

ENOTSUPP:
strtol(3) has `*end != '\0'`.  But make sure you pass a supported base,
or buy a protector for nasal demons.

ERANGE:
strtol(3) has ERANGE; same as strtoi().

In the end, it amounts to saying: "the behavior of strtol(3) is
undefined if the base is unsupported; don't bother to test EINVAL: don't
trigger it".  Which is fine, but we need to clarify that, because if
someone actually needs to use a non-standard base, they should be very
careful, and set end=NULL before the call (but there are no guarantees
that end is not modified either, so...).  Or better, provide strtoi(3)
and compare (err != 0), or (err != 0 && err != E***) if you explicitly
allow some error.

> 
> Of course, the *end != '\0' condition can be omitted or adapted as
> necessary. Alternatively, one can avoid checking errno at all, by just
> checking whether the value is in the permitted range, since the saturating
> behavior will make such a check reject on overflow. And even without an
> explicit permitted range, one can just reject on  on value == LONG_MIN ||
> value == LONG_MAX, or just on value == ULONG_MAX for strtoul(3); rejecting
> a value that's almost an overflow isn't going to harm anything, except for
> the rare scenarios where a printed integer can actually reach the minimum
> or maximum, but needs to be round-tripped unconditionally.
> 
> In general, I don't think most programmers are in the habit of carefully
> distinguishing errno values for <string.h> functions. They'd rather check
> for self-explanatory conditions, such as *endptr == nptr, that readers
> don't have to refer to the man page to decipher. There's a reason that most
> high-level language bindings return errno values for file I/O but not for
> anything else.
> 
> Thank you,
> Matthew House

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux