Hi Paul, On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote: > On 7/7/24 03:58, Alejandro Colomar wrote: > > > I've incorporated feedback, and here's a new revision, let's call it > > v0.2, of the draft for a WG14 paper. > Although I have not followed the email discussion closely, I read v0.2 and > think that as stated there is little chance that its proposal will be > accepted. Thanks for reading thoroughly, and the feedback! > Fundamentally the proposal is trying to say that there are two styles X and > Y for declaring strtol and similar functions, and that although both styles > are correct in some sense, style Y is better than style X. However, the > advantages of Y are not clearly stated and the advantages of style X over Y > are not admitted, so the proposal is not making the case clearly and fairly. > > One other thing: to maximize the chance of a proposal being accepted, please > tailor it for its expected readership. The C committee is expert on > ‘restrict’, so don’t try to define ‘restrict’ in your own way. Unless merely > repeating the language of the standard, any definition given for ‘restrict’ > is likely to cause the committee to quibble with the restatement of the > standard wording. (It is OK to mention some corollaries of the standard > definition, so long as the corollaries are not immediately obvious.) > > Here are some comments about the proposal. At the start these comments are > detailed; towards the end, as I could see the direction the proposal was > headed and was convinced it wouldn’t be accepted as stated, the comments are > less detailed. > > > "The API may copy" > > One normally doesn’t think of the application programming interface as > copying. Please replace the phrase “the API” with “the caller” or “the > callee” as appropriate. (Although ‘restrict’ can be used in places other > than function parameters, I don’t think the proposal is concerned about > those cases and so it doesn’t need to go into that.) Ok. > "To avoid violations of for example C11::6.5.16.1p3," > > Code that violates C11::6.5.16.1p3 will do so regardless of whether > ‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s a red > herring. Fundamentally, ‘restrict’ is not about the consequences of caching > when one does overlapping moves; it’s about caching in a more general sense. The violations are UB regardless of restrict, but consistent use of restrict allows the caller to have a rough model of what the callee will do with the objects, and prevent those violations via compiler diagnostics. I've reworded that part to make it more clear why I'm mentioning that. > “As long as an object is only accessed via one restricted pointer, other > restricted pointers are allowed to point to the same object.” > > “only accessed” → “accessed only” Ok. > “This is less strict than I think it should be, but this proposal doesn’t > attempt to change that definition.” > > I would omit this sentence and all similar sentences. Don’t distract the > reader with other potential proposals. The proposal as it stands is > complicated enough. Ok. > “return ca > a;” > “return ca > *ap;” > > I fail to understand why these examples are present. It’s not simply that > nobody writes code like that: the examples are not on point. I would remove > the entire programs containing them, along with the sections that discuss > them. When writing to the C committee one can assume the reader is expert in > ‘restrict’, there is no need for examples such as these. Those are examples of how consistent use of restrict can --or could, in the case of g()-- detect, via compiler diagnostics, (likely) violations of seemingly unrelated parts of the standard, such as the referenced C11::6.5.16.1p3, or in this case, C11::6.5.8p5. > “strtol(3) accepts 4 objects via pointer parameters and global variables.” > > Omit the “(3)”, here and elsewhere, as the audience is the C standard > committee. The C standard committee doesn't know about historic use of (3)? That predates the standard, and they built on top of that (C originated in Unix). While they probably don't care about it anymore, I expect my paper to be read by other audience, including GCC and glibc, and I prefer to keep it readable for that audience. I expect the standard committee to at least have a rough idea of the existence of this syntax, and respect it, even if they don't use it or like it. > “accepts” is a strange word to use here: normally one says “accepts” to talk > about parameters, not global variables. The thing is, strtol(3) does not actually access *endptr. I thought that might cause more confusion than using "accepts". > Also, “global variables” is not > right here. The C standard allows strtol, for example, to read and write an > internal static cache. (Yes, that would be weird, but it’s allowed.) That's not part of the API. A user must not access internal static cache, and so the implementation is free to assume that it doesn't, regardless of the use of restrict in the API, so it is not relevant for the purpose of this discussion, I think. > I > suggest rephrasing this sentence to talk about accessing, not accepting. I don't want to use accessing, for it would be inconsistent with later saying that *endptr is not accessed. However, I'm open to other suggested terms that might be more appropriate than both. > “endptr access(write_only) ... *endptr access(none)” > > This is true for glibc, but it’s not necessarily true for all conforming > strtol implementations. If endptr is non-null, a conforming strtol > implementation can both read and write *endptr; It can't, I think. It's perfectly valid to pass an uninitialized endptr, which means the callee must not read the original value. char *end; strtol("0", &end, 0); If strtol(3) would be allowed to read it, the user would need to initialize it. > it can also both read and > write **endptr. (Although it would need to write before reading, reading is > allowed.) Here, we need to consider two separate objects. The object pointed-to by *endptr _before_ the object pointed to by endptr is written to, and the object pointed-to by *endptr _after_ the object pointed to by endptr is written to. For the former (the original *endptr): Since *endptr might be uninitialized, strtol(3) must NOT access the object pointed to by an uninitialized pointer. For the latter (the final *endptr): The callee cannot write to it, since the specification of the function is that the string will not be modified. And in any case, such an access is ultimately derived from nptr, not from *endptr, so it does not matter for the discussion of *endptr. Of course, that's derived from the specification of the function, and not from its prototype, since ISO C doesn't provide such detailed prototypes (since it doesn't have the [[gnu::access()]] attribute). But the standard must abide by its own specification of functions, anyway. > “This qualifier helps catch obvious bugs such as strtol(p, p, 0) and > strtol(&p, &p, 0) .” > > No it doesn’t. Ordinary type checking catches those obvious bugs, and > ‘restrict’ provides no additional help there. Please complicate the examples > to make the point more clearly. To be pedantic, I didn't specify the type of p, so it might be (void *), and thus avoid type checking at all. However, to avoid second guessing from the standards committee, I'll add casts, to make it more obvious that restrict is catching those. > “The caller knows that errno doesn’t alias any of the function arguments.” > > Only because all args are declared with ‘restrict’. So if the proposal is > accepted, the caller doesn’t necessarily know that. Not really. The caller has created the string (or has received it via a restricted pointer), and so it knows it's not derived from errno. char buf[LINE_MAX + 1]; fgets(...); n = strtol(buf, ...); This caller knows with certainty that errno does not alias buf. Of course, in some complex cases, it might not know, but I ommitted that for simplicity. And in any case, I don't think any optimizations are affected by that in the caller. > > > “The callee knows that *endptr is not accessed.” > > This is true for glibc, but not necessarily true for every conforming strtol > implementation. The original *endptr may be uninitialized, and so must not be accessed. > “It might seem that it’s a problem that the callee doesn’t know if nptr can > alias errno or not. However, the callee will not write to the latter > directly until it knows it has failed,” > > Again this is true for glibc, but not necessarily true for every conforming > strtol implementation. An implementation is free to set errno = EDEADLK in the middle of it, as long as it later removes that. However, I don't see how it would make any sense. If that's done, it's probably done via a helper internal function, which as said below, can use restrict for nptr, and thus know with certainty that nptr is distinct from errno. If that's done directly in the body of strtol(3) (the only place where it's not known that nptr is distinct from errno) we can probably agree that the implementation is doing that just for fun, and doesn't care about optimization, and thus we can safely ignore it. > To my mind this is the most serious objection. The current standard > prohibits calls like strtol((char *) &errno, 0, 0). The proposal would relax > the standard to allow such calls. In other words, the proposal would > constrain implementations to support such calls. I don't think it does. ISO C specifies that strtol(3) takes a string as its first parameter, and errno is not (unless you do this:). (char *)&errno = "111"; Okay, let's assume you're allowed to do that, since a char* can alias anything. I still don't think ISO C constrains implementations to allow passing (char *)&errno as a char*, just because it's not restrict. Let's find an ISO C function that accepts a non-restrict string: int system(const char *string); Does ISO C constrain implementations to support system((char *)&errno)? I don't think so. Maybe it does implicitly because of a defect in the wording, but even then it's widely understood that it doesn't. > Why is this change worth > making? Real-world programs do not make calls like that. Because it makes analysis of 'restrict' more consistent. The obvious improvement of GCC's analyzer to catch restrict violations will trigger false positives in normal uses of strtol(3). > “But nothing prohibits those internal helper functions to specify that nptr > is restrict and thus distinct from errno.” > > Although true, it’s also the case that the C standard does not *require* > internal helper functions to use ‘restrict’. All that matters is the > accesses. So I’m not sure what the point of this statement is. If an implementation wants to optimize, it should be careful and use restrict. If it doesn't, then it can't complain that ISO C doesn't allow it to. It's actually allowed to optimize, but it has to do some work for it. > “m = strtol(p, &p, 0); An analyzer more powerful than the current ones > could extend the current -Wrestrict diagnostic to also diagnose this case.” > > Why would an analyzer want to do that? This case is a perfectly normal thing > to do and it has well-defined behavior. Because without an analyzer, restrict cannot emit many useful diagnostics. It's a qualifier that's all about data flow analysis, and normal diagnostics aren't able to do that. A qualifier that enables optimizations but doesn't enable diagnostics is quite dangerous, and probably better not used. If however, the analyzer emits advanced diagnostics for misuses of it, then it's a good qualifier. Have a lovely day! Alex > > “To prevent triggering diagnostics in a powerful analyzer that would be > smart enough to diagnose the example function g(), the prototype of > strtol(3) should be changed to ‘long int strtol(const char *nptr, char > **restrict endptr, int base);’” > > Sorry, but the case has not been made to make any such change to strtol’s > prototype. On the contrary, what I’m mostly gathering from the discussion is > that ‘restrict’ can be confusing, which is not news. > > n3220 §6.7.4.2 examples 5 through 7 demonstrate that the C committee has > thought through the points you’re making. (These examples were not present > in C11.) This may help to explain why the standard specifies strtol with > ‘restrict’ on both arguments. > -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature