Hi Paul, On Sun, Jul 07, 2024 at 07:30:43PM GMT, Paul Eggert wrote: > On 7/7/24 14:42, Alejandro Colomar wrote: > > On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote: > > > Also, “global variables” is not > > > right here. The C standard allows strtol, for example, to read and write an > > > internal static cache. (Yes, that would be weird, but it’s allowed.) > > > > That's not part of the API. A user must not access internal static > > cache > > Although true in the normal (sane) case, as an extension the implementation > can make such a static cache visible to the user, and in this case the > caller must not pass cache addresses as arguments to strtol. > > For other functions this point is not purely academic. For example, the C > standard specifies the signature "FILE *fopen(const char *restrict, const > char *restrict);". If I understand your argument correctly, it says that the > "restrict"s can be omitted there without changing the set of valid programs. No, I didn't say that restrict can be removed from fopen(3). What I say is that in functions that accept pointers that alias each other, those aliasing pointers should not be restrict. Usually, pointers that alias are accessed, and thus they are not specified as restrict, such as in memmove(3). However, a small set of functions accept pointers that alias each other, but one of them is never accessed; in those few cases, restrict was added to the parameters in ISO C, but I claim it would be better removed. We're lucky, and the small set of functions where this happens don't seem to use any state, so we don't need to care about implementations using internal buffers that are passed somehow to the user.. From ISO C, IIRC, the only examples are strtol(3) et al. Another example is Plan9's seprint(3) family of functions. However, Plan9 doesn't use restrict, so it doesn't have it. > But that can't be right, as omitting the "restrict"s would make the > following code be valid in any platform where sizeof(int)>1: > > char *p = (char *) &errno; > p[0] = 'r'; > p[1] = 0; > FILE *f = fopen (p, p); > > even though the current standard says this code is invalid. No, I wouldn't remove any of the restrict qualifiers in fopen(3). Only from pointers that alias an access(none) pointer. > > > “endptr access(write_only) ... *endptr access(none)” > > > > > > This is true for glibc, but it’s not necessarily true for all conforming > > > strtol implementations. If endptr is non-null, a conforming strtol > > > implementation can both read and write *endptr; > > > > It can't, I think. It's perfectly valid to pass an uninitialized > > endptr, which means the callee must not read the original value. > > Sure, but the callee can do something silly like "*endptr = p + 1; *endptr = > *endptr - 1;". That is, it can read *endptr after writing it, without any > undefined behavior. (And if the callee is written in assembly language it > can read *endptr even before writing it - but I digress.) But once you modify the pointer provenance, you don't care anymore about it. We need to consider the pointers that a function receives, which are the ones the callee needs to know their provenance. Of course, a callee knows what it does, and so doesn't need restrict in local variables. C23/N3220::6.7.4.1p9 says: An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.4.2, requires that all accesses to that object use, directly or indirectly, the value of that pointer. When you set *endptr = nptr + x, and use the lvalue **endptr, you're still accessing the object indirectly using the value of nptr. So, strtol(3) gets 4 objects, let's call them A, B, C, and D. A is gotten via its pointer nptr. B is gotten via its pointer endptr. C is gotten via its pointer *endptr. D is gotten via the global variable errno. Object A may be the same as object C. Object B is unique inside the callee. Its pointer endptr must be restrict-qualified to denote its uniqueness. Object D is unique, but there's no way to specify that. Object C must NOT be read or written. The function is of course allowed to set *endptr to whatever it likes, and then access it however it likes, but object C must still NOT be accessed, since its pointer may be uninitialized, and thus point to no object at all. Maybe I should use abstract names for the objects, to avoid confusing them with the pointer variables that are used to pass them? The formal definition of restrict refers to the "object into which it formerly [in the list of parameter declarations of a function definition] pointed". I'm not 100% certain, because this formal definition is quite unreadable, though. The more I read it, the less sure I am about it. BTW, I noticed something I didn't know: If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified This reads to me as "const variables are not writable when they are accessed via a restricted pointer; casting away is not enough". Am I reading this correctly? > The point is that it is not correct to say that *endptr cannot be read from; > it can. Similarly for **endptr. Some better wording: the object pointed-to by *endptr at function entry cannot be accessed. > > Here, we need to consider two separate objects. The object pointed-to > > by *endptr _before_ the object pointed to by endptr is written to, and > > the object pointed-to by *endptr _after_ the object pointed to by endptr > > is written to. > > Those are not the only possibilities. The C standard also permits strtol to > set *endptr to some other pointer value, not pointing anywhere into the > string being scanned, so long as it sets *endptr correctly before it > returns. Let's reword. The initial object pointed-to by it, and everything else. > > > “The caller knows that errno doesn’t alias any of the function arguments.” > > > > > > Only because all args are declared with ‘restrict’. So if the proposal is > > > accepted, the caller doesn’t necessarily know that. > > > > Not really. The caller has created the string (or has received it via a > > restricted pointer) > > v0.2 doesn't state the assumption that the caller either created the string > or received it via a restricted pointer. If this assumption were stated > clearly, that would address the objection here. Ok. > > > “The callee knows that *endptr is not accessed.” > > > > > > This is true for glibc, but not necessarily true for every conforming strtol > > > implementation. > > > > The original *endptr may be uninitialized, and so must not be accessed. > > **endptr can be read once the callee sets *endptr. **endptr can even be > written, if the callee temporarily sets *endptr to point to a writable > buffer; admittedly this would be weird but it's allowed. The object originally pointed-to by *endptr (C) is what we care about. Subsequently reusing the same pointer variable for pointing to different objects is uninteresting for the purposes of knowing which objects are accessed and in which way. > > > “It might seem that it’s a problem that the callee doesn’t know if nptr can > > > alias errno or not. However, the callee will not write to the latter > > > directly until it knows it has failed,” > > > > > > Again this is true for glibc, but not necessarily true for every conforming > > > strtol implementation. > > > > An implementation is free to set errno = EDEADLK in the middle of it, as > > long as it later removes that. However, I don't see how it would make > > any sense. > > It could make sense in some cases. Here the spec is a bit tricky, but an > implementation is allowed to set errno = EINVAL first thing, and then set > errno to some other nonzero value if it determines that the arguments are > valid. I wouldn't implement strtol that way, but I can see where someone > else might do that. In any case an implementation is not obliged to pessimize strtol(3). It is only allowed to. Should we not allow them to do so? > > Let's find > > an ISO C function that accepts a non-restrict string: > > > > int system(const char *string); > > > > Does ISO C constrain implementations to support system((char *)&errno)? > > I don't think so. Maybe it does implicitly because of a defect in the > > wording, but even then it's widely understood that it doesn't. > > 'system' is a special case since the C standard says 'system' can do pretty > much anything it likes. That being said, I agree that implementations > shouldn't need to support calls like atol((char *) &errno). Certainly the C > standard's description of atol, which defines atol's behavior in terms of a > call to strtol, means that atol's argument in practice must follow the > 'restrict' rules. Let's take a simpler one: rename(2). Is it allowed to receive &errno? Hopefully not. > > Perhaps we should report this sort of thing as a defect in the standard. It > is odd, for example, that fopen's two arguments are both const char > *restrict, but system's argument lacks the "restrict". > Meh, I don't care enough, I think. > > > Why is this change worth > > > making? Real-world programs do not make calls like that. > > > > Because it makes analysis of 'restrict' more consistent. The obvious > > improvement of GCC's analyzer to catch restrict violations will trigger > > false positives in normal uses of strtol(3). > > v0.2 does not support this line of reasoning. On the contrary, v0.2 suggests > that a compiler should diagnose calls like "strtol(p, &p, 0)", which would > be wrong as that call is perfectly reasonable. That call is perfectly, reasonable, which is why I suggest that the standard should modify the prototype so that strtol(p, &p, 0), which is a reasonable call, should not be warned by a compiler that would diagnose such calls. That is, just by reading the prototypes: void foo(int *restrict x, int **restrict p); and void bar(int *x, int **restrict endp); one should be able to determine that foo(p, &p); is probably causing UB (and thus trigger a warning) but bar(p, &p); is fine. > > Another way to put it: v0.2 does not clearly state the advantages of the > proposed change, and in at least one area what it states as an advantage > would actually be a disadvantage. The advantage is having more information in the caller. As a caller, I want to distinguish calls where it's ok to pass pointers that alias, and where not. And I want my compiler to be able to help me there. If restrict is overapplied, then an analyzer cannot determine that. Or as Martin noted, it can, if it takes both the restrict qualifiers _and_ the access attributes into account, and performs some non-trivial deduction. I'd rather have a simple analyzer, which will provide for less false positives and negatives. > > > “m = strtol(p, &p, 0); An analyzer more powerful than the current ones > > > could extend the current -Wrestrict diagnostic to also diagnose this case.” > > > > > > Why would an analyzer want to do that? This case is a perfectly normal thing > > > to do and it has well-defined behavior. > > > > Because without an analyzer, restrict cannot emit many useful > > diagnostics. It's a qualifier that's all about data flow analysis, and > > normal diagnostics aren't able to do that. > > > > A qualifier that enables optimizations but doesn't enable diagnostics is > > quite dangerous, and probably better not used. If however, the analyzer > > emits advanced diagnostics for misuses of it, then it's a good > > qualifier. > > Sorry, but I don't understand what you're trying to say here. Really, I > can't make heads or tails of it. As-is, 'restrict' can be useful both for > optimization and for generating diagnostics, and GCC does both of these > things right now even if you don't use -fanalyzer. GCC can only catch the most obvious violations of restrict. #include <string.h> typedef struct { int x; } T; [[gnu::access(read_only, 1)]] [[gnu::access(read_only, 2)]] void replace(T *restrict *restrict ls, const T *restrict new, size_t pos) { memcpy(ls[pos], new, sizeof(T)); } void f(T *restrict *restrict ls) { replace(ls, ls[0], 1); } $ gcc-14 -Wall -Wextra -fanalyzer replace.c -S $ The above program causes UB, and uses an API that is so similar to strtol(3), that for writing an analyzer that triggers on this code, it will trigger on strtol(3) too. Only if it's smart enough to also consider the GNU access attributes, it will be able to differentiate the two cases, as Martin suggested. > Perhaps adding an example or two would help explain your point. But they'd > need to be better examples than what's in v0.2 because v0.2 is unclear about > this quality-of-diagnostics issue, as it relates to strtol. Maybe the above? Cheers, Alex -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature