Re: [patch] atoi.3: Document return value on under/overflow as undefined

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Thomas,

On Sun, Dec 10, 2023 at 06:08:48AM -0800, thomas@xxxxxxxxx wrote:
> See patch below.
> 
> --
> typedef struct me_s {
>   char name[]      = { "Thomas Habets" };
>   char email[]     = { "thomas@xxxxxxxxx" };
>   char kernel[]    = { "Linux" };
>   char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
>   char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
>   char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
> } me_t;
> 
> 
> commit 095cc630082ea389d5f6657ce497e02d3dde0b21
> Author: Thomas Habets <thomas@xxxxxxxxx>
> Date:   Sun Dec 10 13:44:47 2023 +0000
> 
>     atoi.3: Document return value on under/overflow as undefined
> 
>     Before this change, the manpage is clear enough:
> 
>     ```
>     RETURN VALUE
>            The converted value or 0 on error.

For extra fun, you could have quoted this together :)

```
     except that atoi() does not detect errors.
```

>     […]
>     No checks for overflow or underflow are done.
>     ```
> 
>     This is not really true. atoi() uses strtol() to convert from string
>     to long, and the results may under or overflow a long, in which
>     case strtol() returns LONG_MIN and LONG_MAX, respectively.
> 
>     LONG_MIN cast to int is 0, which lives up to the manpage just fine
>     ("0 on error"), assuming underflow should be seen as an error.
> 
>     LONG_MAX cast to int is -1.
> 
>     POSIX says "The atoi() function shall return the converted value if
>     the value can be represented", the current behavior doesn't violate
>     POSIX.
> 
>     But is surprising. And arguably is incorrectly documented for Linux
>     manpages. There is, in fact, a range check, but but against long, not
>     int.

We could say it's just an accident, and not an intentional check.
Something similar happens in sscanf(3).  Since something between INT_MAX
and LONG_MAX won't be covered by that range check, let's say there's
none, for simplicity.

> "Error" is not defined in the manpage. Is over/underflow an
>     error?
> 
>     It's kinda handled, kinda not, with the effect that over and underflow
>     have different return values for atoi(), and for atol() proper range
>     checking is in fact being done by the implementation.
> 
>     It would be possible to document atol(3) to say that it actually does
>     range checking, but that seems like a bigger commitment than this
>     clarification.
> 
>     More thoughts from me on parsing and handling integers:
> 
>     https://blog.habets.se/2022/10/No-way-to-parse-integers-in-C.html
>     https://blog.habets.se/2022/11/Integer-handling-is-broken.html

Very interesting!

> 
>     Previously (incorrectly) filed as a bug here:
>     https://sourceware.org/bugzilla/show_bug.cgi?id=29753
> 
>     Signed-off-by: Thomas Habets <thomas@xxxxxxxxx>
> 
> diff --git a/man3/atoi.3 b/man3/atoi.3
> index f5fb5d0e1..7c005fc15 100644
> --- a/man3/atoi.3
> +++ b/man3/atoi.3
> @@ -111,7 +111,9 @@ only.
>  .I errno
>  is not set on error so there is no way to distinguish between 0 as an
>  error and as the converted value.
> -No checks for overflow or underflow are done.
> +The return value in case of under/overflow is undefined, but currently
> +atol() and atoll() return LONG_MIN/LONG_MAX and LLONG_MIN/LLONG_MAX,
> +respectively.

I don't want to document current behavior, since that behavior is
completely bogus, and beter described as undefined.  Let curious
programmers find out how much undefined it is.

Also, it's not only the return value that is undefined; the entire
program behavior is undefined.  We're lucky that the compiler is
(likely) unable to see the UB, and so it can't freak out.

So, a patch should say the behavior is undefined if the value is not
representable in an int.

However, maybe we should instead try to fix glibc to do the right thing.

	int
	atoi(const char *nptr)
	{
		int   i, err;

		i = strtoi(nptr, NULL, 10, INT_MIN, INT_MAX, &err);
		if (err)
			errno = err;
		return i;
	}

This is compatible with ISO C, since it behaves like

	(int) strtol(nptr, NULL, 10);

"Except for the behavior on error", in which this atoi(3) implementation
sets errno, but nothing forbids that (ISO C only says "need not affect
the value of the integer expression errno on an error", which allows
affecting errno).  POSIX also allows this implementation: "except that
the handling of errors may differ".

Have a lovely night,
Alex

>  Only base-10 input can be converted.
>  It is recommended to instead use the
>  .BR strtol ()
> 

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux