Re: restrictness of strtoi(3bsd) and strtol(3)

Alejandro Colomar via Gcc-help <gcc-help@xxxxxxxxxxx> · Sat, 2 Dec 2023 13:34:06 +0100

On Sat, Dec 02, 2023 at 01:29:01PM +0100, Alejandro Colomar wrote:
> On Sat, Dec 02, 2023 at 12:50:28PM +0100, Alejandro Colomar wrote:
> > Hi,
> > 
> > I've been implementing my own copy of strto[iu](3bsd), to avoid the
> > complexity of calling strtol(3) et al.  In the process, I've noticed
> > that all of these functions use restrict for their parameters.
> > 
> > Why do these functions use restrict?  While the second parameter is not
> > used for accessing nptr memory (**endptr is not accessed), it can point
> > to the same memory.  Here is an example of how these functions can have
> > pointers to the same memory in the two arguments.
> > 
> > 	l = strtol(p, &p, 0);
> > 
> > The use of restrict in the prototype of the function could result in
> > compiler warnings, no?  Currently, I don't see any warnings, but I
> > suspect the compiler could complain, since the same memory is available
> > to the function via two different arguments (albeit with a different
> > number of references).
> > 
> > The use of restrict in the definition of the function doesn't help the
> > optimizer, since it already knows that the second parameter is out-only,
> > so even if it weren't restrict, the only way to access memory is via the
> > first parameter.
> 
> In the case of strto[iu](3bsd), I have even more doubts.
> 
> Here's libbsd's version of it (omitting unimportant parts):
> 
> 	$ grepc -tfd strtoi .
> 	./src/strtoi.c:intmax_t
> 	strtoi(const char *__restrict nptr,
> 	       char **__restrict endptr, int base,
> 	       intmax_t lo, intmax_t hi, int *rstatus)
> 	{
> 		...
> 
> 		im = strtoimax(nptr, endptr, base);
> 
> 		*rstatus = errno;
> 		errno = serrno;
> 
> 		if (*rstatus == 0) {
> 			/* No digits were found */
> 			if (nptr == *endptr)
> 				*rstatus = ECANCELED;
> 			/* There are further characters after number */
> 			else if (**endptr != '\0')
> 				*rstatus = ENOTSUP;
> 		}
> 
> 		...
> 
> 		return im;
> 	}
> 
> Let's say the base is unsupported (e.g., -42), and endptr initially
> points to nptr-1.  Imagine this call:
> 
> 	i = strtoimax(p + 1, &p, -42);
> 
> ISO C doesn't specify what happens if the base is not between 0 and 36,
> so the behavior is probably undefined in ISO C.
> 
> POSIX says it returns 0 and sets errno to EINVAL, but doesn't say what
> happens to endptr.  I expect two possible implementations:
> 
> -  Leave endptr untouched.
> -  Set *endptr = nptr.
> 
> Let's suppose it leaves endptr untouched (otherwise, it would be
> impossible to portably differentiate an EINVAL due to unsupported base
> from an EINVAL due to no digits in the string).
> 
> So, the test (nptr == *endptr) would be false (because p+1 != p), and
> the code would jump into accessing **endptr without having derived
> that pointer from nptr, which is a violation of restrict.

Oops, it's within an (errno == 0) path, so *endptr is guaranteed to be
derived from nptr here.

So no bug, but still unclear to me what's the benefit of using restrict,
and also unclear why GCC doesn't warn about it at call site.

> I made many assumptions here, where the standards are not clear, so I
> may be wrong in some of them.  But it looks to me like a bug.
> 
> CCing libbsd.
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature