Hi Matthew, On Wed, Dec 06, 2023 at 01:33:50PM -0500, Matthew House wrote: > On Wed, Dec 6, 2023 at 11:36 AM Alejandro Colomar <alx@xxxxxxxxxx> wrote: > > Also, I was going to ask for strtoi(3bsd) in glibc, since strtol(3) > > isn't easy to use portably (since POSIX allows EINVAL on no conversion, > > how to differentiate strtoi(3bsd)'s ECANCELED from EINVAL in strtol(3)?). > > I feel like this is rather overstating the difficulty. In practice, the > no-conversion condition is very commonly detected by checking whether > *endptr == nptr after the call. The usual idiom I see is something like: > > char *end; > errno = 0; > value = strtol(ptr, &end, 10); > if (end == ptr || *end != '\0' || errno == ERANGE) That test could trigger UB, if you passed an unsupported base. Of course, in this case you pass 10, but what if the base was a user-controlled variable? In such a case, nothing says what happens to 'end' (experimentally, I see it is not modified, so it would be left uninitialized); so dereferencing it, or even comparing it, would be UB. > goto err; Yeah, if you just don't care and want to handle all errors in the same way, and you know the base is supported, this is correct. But what happens when you want to differentiate the different errors? Let's list the possible errors, as per strtoi(3bsd): ERRORS [ECANCELED] The string did not contain any characters that were converted. [EINVAL] The base is not between 2 and 36 and does not contain the special value 0. [ENOTSUP] The string contained non‐numeric characters that did not get converted. In this case, endptr points to the first unconverted charac‐ ter. [ERANGE] The given string was out of range; the value converted has been clamped; or the range given was invalid, i.e. lo > hi. Let's see how strtol(3) handles these: ECANCELED: strtol(1) has `end == ptr`. But POSIX allows EINVAL. But make sure you pass a supported base. EINVAL: strtol(1) has EINVAL. But what happens to end? It could be left unmodified (current glibc behavior); or could be set to ptr, since none of the string has been read. If the former, it's easy to trigger UB. If the latter, it is indistinguishable from ECANCELED. ENOTSUPP: strtol(3) has `*end != '\0'`. But make sure you pass a supported base, or buy a protector for nasal demons. ERANGE: strtol(3) has ERANGE; same as strtoi(). In the end, it amounts to saying: "the behavior of strtol(3) is undefined if the base is unsupported; don't bother to test EINVAL: don't trigger it". Which is fine, but we need to clarify that, because if someone actually needs to use a non-standard base, they should be very careful, and set end=NULL before the call (but there are no guarantees that end is not modified either, so...). Or better, provide strtoi(3) and compare (err != 0), or (err != 0 && err != E***) if you explicitly allow some error. > > Of course, the *end != '\0' condition can be omitted or adapted as > necessary. Alternatively, one can avoid checking errno at all, by just > checking whether the value is in the permitted range, since the saturating > behavior will make such a check reject on overflow. And even without an > explicit permitted range, one can just reject on on value == LONG_MIN || > value == LONG_MAX, or just on value == ULONG_MAX for strtoul(3); rejecting > a value that's almost an overflow isn't going to harm anything, except for > the rare scenarios where a printed integer can actually reach the minimum > or maximum, but needs to be round-tripped unconditionally. > > In general, I don't think most programmers are in the habit of carefully > distinguishing errno values for <string.h> functions. They'd rather check > for self-explanatory conditions, such as *endptr == nptr, that readers > don't have to refer to the man page to decipher. There's a reason that most > high-level language bindings return errno values for file I/O but not for > anything else. > > Thank you, > Matthew House -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature