Re: WG14 paper for removing restrict from nptr in strtol(3)

Alejandro Colomar <alx@xxxxxxxxxx> · Sun, 7 Jul 2024 14:42:54 +0200

Hi Paul,

On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
> On 7/7/24 03:58, Alejandro Colomar wrote:
> 
> > I've incorporated feedback, and here's a new revision, let's call it
> > v0.2, of the draft for a WG14 paper.
> Although I have not followed the email discussion closely, I read v0.2 and
> think that as stated there is little chance that its proposal will be
> accepted.

Thanks for reading thoroughly, and the feedback!

> Fundamentally the proposal is trying to say that there are two styles X and
> Y for declaring strtol and similar functions, and that although both styles
> are correct in some sense, style Y is better than style X. However, the
> advantages of Y are not clearly stated and the advantages of style X over Y
> are not admitted, so the proposal is not making the case clearly and fairly.
> 
> One other thing: to maximize the chance of a proposal being accepted, please
> tailor it for its expected readership. The C committee is expert on
> ‘restrict’, so don’t try to define ‘restrict’ in your own way. Unless merely
> repeating the language of the standard, any definition given for ‘restrict’
> is likely to cause the committee to quibble with the restatement of the
> standard wording. (It is OK to mention some corollaries of the standard
> definition, so long as the corollaries are not immediately obvious.)
> 
> Here are some comments about the proposal. At the start these comments are
> detailed; towards the end, as I could see the direction the proposal was
> headed and was convinced it wouldn’t be accepted as stated, the comments are
> less detailed.
> 
> 
> "The API may copy"
> 
> One normally doesn’t think of the application programming interface as
> copying. Please replace the phrase “the API” with “the caller” or “the
> callee” as appropriate. (Although ‘restrict’ can be used in places other
> than function parameters, I don’t think the proposal is concerned about
> those cases and so it doesn’t need to go into that.)

Ok.

> "To avoid violations of for example C11::6.5.16.1p3,"
> 
> Code that violates C11::6.5.16.1p3 will do so regardless of whether
> ‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s a red
> herring. Fundamentally, ‘restrict’ is not about the consequences of caching
> when one does overlapping moves; it’s about caching in a more general sense.

The violations are UB regardless of restrict, but consistent use of
restrict allows the caller to have a rough model of what the callee will
do with the objects, and prevent those violations via compiler
diagnostics.  I've reworded that part to make it more clear why I'm
mentioning that.

> “As long as an object is only accessed via one restricted pointer, other
> restricted pointers are allowed to point to the same object.”
> 
> “only accessed” → “accessed only”

Ok.

> “This is less strict than I think it should be, but this proposal doesn’t
> attempt to change that definition.”
> 
> I would omit this sentence and all similar sentences. Don’t distract the
> reader with other potential proposals. The proposal as it stands is
> complicated enough.

Ok.

> “return ca > a;”
> “return ca > *ap;”
> 
> I fail to understand why these examples are present. It’s not simply that
> nobody writes code like that: the examples are not on point. I would remove
> the entire programs containing them, along with the sections that discuss
> them. When writing to the C committee one can assume the reader is expert in
> ‘restrict’, there is no need for examples such as these.

Those are examples of how consistent use of restrict can --or could, in
the case of g()-- detect, via compiler diagnostics, (likely) violations
of seemingly unrelated parts of the standard, such as the referenced
C11::6.5.16.1p3, or in this case, C11::6.5.8p5.  

> “strtol(3) accepts 4 objects via pointer parameters and global variables.”
> 
> Omit the “(3)”, here and elsewhere, as the audience is the C standard
> committee.

The C standard committee doesn't know about historic use of (3)?  That
predates the standard, and they built on top of that (C originated in
Unix).  While they probably don't care about it anymore, I expect my
paper to be read by other audience, including GCC and glibc, and I
prefer to keep it readable for that audience.  I expect the standard
committee to at least have a rough idea of the existence of this syntax,
and respect it, even if they don't use it or like it.

> “accepts” is a strange word to use here: normally one says “accepts” to talk
> about parameters, not global variables.

The thing is, strtol(3) does not actually access *endptr.  I thought
that might cause more confusion than using "accepts".

> Also, “global variables” is not
> right here. The C standard allows strtol, for example, to read and write an
> internal static cache. (Yes, that would be weird, but it’s allowed.)

That's not part of the API.  A user must not access internal static
cache, and so the implementation is free to assume that it doesn't,
regardless of the use of restrict in the API, so it is not relevant for
the purpose of this discussion, I think.

> I
> suggest rephrasing this sentence to talk about accessing, not accepting.

I don't want to use accessing, for it would be inconsistent with later
saying that *endptr is not accessed.  However, I'm open to other
suggested terms that might be more appropriate than both.

> “endptr	access(write_only) ... *endptr access(none)”
> 
> This is true for glibc, but it’s not necessarily true for all conforming
> strtol implementations. If endptr is non-null, a conforming strtol
> implementation can both read and write *endptr;

It can't, I think.  It's perfectly valid to pass an uninitialized
endptr, which means the callee must not read the original value.

	char *end;
	strtol("0", &end, 0);

If strtol(3) would be allowed to read it, the user would need to
initialize it.

> it can also both read and
> write **endptr. (Although it would need to write before reading, reading is
> allowed.)

Here, we need to consider two separate objects.  The object pointed-to
by *endptr _before_ the object pointed to by endptr is written to, and
the object pointed-to by *endptr _after_ the object pointed to by endptr
is written to.

For the former (the original *endptr):

	Since *endptr might be uninitialized, strtol(3) must NOT access
	the object pointed to by an uninitialized pointer.

For the latter (the final *endptr):

	The callee cannot write to it, since the specification of the
	function is that the string will not be modified.  And in any
	case, such an access is ultimately derived from nptr, not from
	*endptr, so it does not matter for the discussion of *endptr.

Of course, that's derived from the specification of the function, and
not from its prototype, since ISO C doesn't provide such detailed
prototypes (since it doesn't have the [[gnu::access()]] attribute).  But
the standard must abide by its own specification of functions, anyway.

> “This qualifier helps catch obvious bugs such as strtol(p, p, 0) and
> strtol(&p, &p, 0) .”
> 
> No it doesn’t. Ordinary type checking catches those obvious bugs, and
> ‘restrict’ provides no additional help there. Please complicate the examples
> to make the point more clearly.

To be pedantic, I didn't specify the type of p, so it might be (void *),
and thus avoid type checking at all.  However, to avoid second guessing
from the standards committee, I'll add casts, to make it more obvious
that restrict is catching those.

> “The caller knows that errno doesn’t alias any of the function arguments.”
> 
> Only because all args are declared with ‘restrict’. So if the proposal is
> accepted, the caller doesn’t necessarily know that.

Not really.  The caller has created the string (or has received it via a
restricted pointer), and so it knows it's not derived from errno.

	char buf[LINE_MAX + 1];

	fgets(...);
	n = strtol(buf, ...);

This caller knows with certainty that errno does not alias buf.  Of
course, in some complex cases, it might not know, but I ommitted that
for simplicity.  And in any case, I don't think any optimizations are
affected by that in the caller.

> 
> 
> “The callee knows that *endptr is not accessed.”
> 
> This is true for glibc, but not necessarily true for every conforming strtol
> implementation.

The original *endptr may be uninitialized, and so must not be accessed.

> “It might seem that it’s a problem that the callee doesn’t know if nptr can
> alias errno or not. However, the callee will not write to the latter
> directly until it knows it has failed,”
> 
> Again this is true for glibc, but not necessarily true for every conforming
> strtol implementation.

An implementation is free to set errno = EDEADLK in the middle of it, as
long as it later removes that.  However, I don't see how it would make
any sense.

If that's done, it's probably done via a helper internal function, which
as said below, can use restrict for nptr, and thus know with certainty
that nptr is distinct from errno.

If that's done directly in the body of strtol(3) (the only place where
it's not known that nptr is distinct from errno) we can probably agree
that the implementation is doing that just for fun, and doesn't care
about optimization, and thus we can safely ignore it.

> To my mind this is the most serious objection. The current standard
> prohibits calls like strtol((char *) &errno, 0, 0). The proposal would relax
> the standard to allow such calls. In other words, the proposal would
> constrain implementations to support such calls.

I don't think it does.  ISO C specifies that strtol(3) takes a string as
its first parameter, and errno is not (unless you do this:).

	(char *)&errno = "111";

Okay, let's assume you're allowed to do that, since a char* can alias
anything.

I still don't think ISO C constrains implementations to allow passing
(char *)&errno as a char*, just because it's not restrict.  Let's find
an ISO C function that accepts a non-restrict string:

	int system(const char *string);

Does ISO C constrain implementations to support system((char *)&errno)?
I don't think so.  Maybe it does implicitly because of a defect in the
wording, but even then it's widely understood that it doesn't.

> Why is this change worth
> making? Real-world programs do not make calls like that.

Because it makes analysis of 'restrict' more consistent.  The obvious
improvement of GCC's analyzer to catch restrict violations will trigger
false positives in normal uses of strtol(3).

> “But nothing prohibits those internal helper functions to specify that nptr
> is restrict and thus distinct from errno.”
> 
> Although true, it’s also the case that the C standard does not *require*
> internal helper functions to use ‘restrict’. All that matters is the
> accesses. So I’m not sure what the point of this statement is.

If an implementation wants to optimize, it should be careful and use
restrict.  If it doesn't, then it can't complain that ISO C doesn't
allow it to.  It's actually allowed to optimize, but it has to do some
work for it.

> “m = strtol(p, &p, 0); An analyzer more powerful than the current ones
> could extend the current -Wrestrict diagnostic to also diagnose this case.”
> 
> Why would an analyzer want to do that? This case is a perfectly normal thing
> to do and it has well-defined behavior.

Because without an analyzer, restrict cannot emit many useful
diagnostics.  It's a qualifier that's all about data flow analysis, and
normal diagnostics aren't able to do that.

A qualifier that enables optimizations but doesn't enable diagnostics is
quite dangerous, and probably better not used.  If however, the analyzer
emits advanced diagnostics for misuses of it, then it's a good
qualifier.

Have a lovely day!
Alex

> 
> “To prevent triggering diagnostics in a powerful analyzer that would be
> smart enough to diagnose the example function g(), the prototype of
> strtol(3) should be changed to ‘long int strtol(const char *nptr, char
> **restrict endptr, int base);’”
> 
> Sorry, but the case has not been made to make any such change to strtol’s
> prototype. On the contrary, what I’m mostly gathering from the discussion is
> that ‘restrict’ can be confusing, which is not news.
> 
> n3220 §6.7.4.2 examples 5 through 7 demonstrate that the C committee has
> thought through the points you’re making. (These examples were not present
> in C11.) This may help to explain why the standard specifies strtol with
> ‘restrict’ on both arguments.
> 

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature