Re: restrictness of strtoi(3bsd) and strtol(3)

Alejandro Colomar via Gcc-help <gcc-help@xxxxxxxxxxx> · Sat, 2 Dec 2023 13:29:01 +0100

On Sat, Dec 02, 2023 at 12:50:28PM +0100, Alejandro Colomar wrote:
> Hi,
> 
> I've been implementing my own copy of strto[iu](3bsd), to avoid the
> complexity of calling strtol(3) et al.  In the process, I've noticed
> that all of these functions use restrict for their parameters.
> 
> Why do these functions use restrict?  While the second parameter is not
> used for accessing nptr memory (**endptr is not accessed), it can point
> to the same memory.  Here is an example of how these functions can have
> pointers to the same memory in the two arguments.
> 
> 	l = strtol(p, &p, 0);
> 
> The use of restrict in the prototype of the function could result in
> compiler warnings, no?  Currently, I don't see any warnings, but I
> suspect the compiler could complain, since the same memory is available
> to the function via two different arguments (albeit with a different
> number of references).
> 
> The use of restrict in the definition of the function doesn't help the
> optimizer, since it already knows that the second parameter is out-only,
> so even if it weren't restrict, the only way to access memory is via the
> first parameter.

In the case of strto[iu](3bsd), I have even more doubts.

Here's libbsd's version of it (omitting unimportant parts):

	$ grepc -tfd strtoi .
	./src/strtoi.c:intmax_t
	strtoi(const char *__restrict nptr,
	       char **__restrict endptr, int base,
	       intmax_t lo, intmax_t hi, int *rstatus)
	{
		...

		im = strtoimax(nptr, endptr, base);

		*rstatus = errno;
		errno = serrno;

		if (*rstatus == 0) {
			/* No digits were found */
			if (nptr == *endptr)
				*rstatus = ECANCELED;
			/* There are further characters after number */
			else if (**endptr != '\0')
				*rstatus = ENOTSUP;
		}

		...

		return im;
	}

Let's say the base is unsupported (e.g., -42), and endptr initially
points to nptr-1.  Imagine this call:

	i = strtoimax(p + 1, &p, -42);

ISO C doesn't specify what happens if the base is not between 0 and 36,
so the behavior is probably undefined in ISO C.

POSIX says it returns 0 and sets errno to EINVAL, but doesn't say what
happens to endptr.  I expect two possible implementations:

-  Leave endptr untouched.
-  Set *endptr = nptr.

Let's suppose it leaves endptr untouched (otherwise, it would be
impossible to portably differentiate an EINVAL due to unsupported base
from an EINVAL due to no digits in the string).

So, the test (nptr == *endptr) would be false (because p+1 != p), and
the code would jump into accessing **endptr without having derived
that pointer from nptr, which is a violation of restrict.

I made many assumptions here, where the standards are not clear, so I
may be wrong in some of them.  But it looks to me like a bug.

CCing libbsd.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature