On Sun, 2024-07-07 at 14:42 +0200, Alejandro Colomar wrote: > Hi Paul, > > On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote: > > On 7/7/24 03:58, Alejandro Colomar wrote: > > > > > I've incorporated feedback, and here's a new revision, let's call > > > it > > > v0.2, of the draft for a WG14 paper. > > Although I have not followed the email discussion closely, I read > > v0.2 and > > think that as stated there is little chance that its proposal will > > be > > accepted. > > Thanks for reading thoroughly, and the feedback! > > > Fundamentally the proposal is trying to say that there are two > > styles X and > > Y for declaring strtol and similar functions, and that although > > both styles > > are correct in some sense, style Y is better than style X. However, > > the > > advantages of Y are not clearly stated and the advantages of style > > X over Y > > are not admitted, so the proposal is not making the case clearly > > and fairly. > > > > One other thing: to maximize the chance of a proposal being > > accepted, please > > tailor it for its expected readership. The C committee is expert on > > ‘restrict’, so don’t try to define ‘restrict’ in your own way. > > Unless merely > > repeating the language of the standard, any definition given for > > ‘restrict’ > > is likely to cause the committee to quibble with the restatement of > > the > > standard wording. (It is OK to mention some corollaries of the > > standard > > definition, so long as the corollaries are not immediately > > obvious.) > > > > Here are some comments about the proposal. At the start these > > comments are > > detailed; towards the end, as I could see the direction the > > proposal was > > headed and was convinced it wouldn’t be accepted as stated, the > > comments are > > less detailed. > > > > > > "The API may copy" > > > > One normally doesn’t think of the application programming interface > > as > > copying. Please replace the phrase “the API” with “the caller” or > > “the > > callee” as appropriate. (Although ‘restrict’ can be used in places > > other > > than function parameters, I don’t think the proposal is concerned > > about > > those cases and so it doesn’t need to go into that.) > > Ok. > > > "To avoid violations of for example C11::6.5.16.1p3," > > > > Code that violates C11::6.5.16.1p3 will do so regardless of whether > > ‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s > > a red > > herring. Fundamentally, ‘restrict’ is not about the consequences of > > caching > > when one does overlapping moves; it’s about caching in a more > > general sense. > > The violations are UB regardless of restrict, but consistent use of > restrict allows the caller to have a rough model of what the callee > will > do with the objects, and prevent those violations via compiler > diagnostics. I've reworded that part to make it more clear why I'm > mentioning that. > > > “As long as an object is only accessed via one restricted pointer, > > other > > restricted pointers are allowed to point to the same object.” > > > > “only accessed” → “accessed only” > > Ok. > > > “This is less strict than I think it should be, but this proposal > > doesn’t > > attempt to change that definition.” > > > > I would omit this sentence and all similar sentences. Don’t > > distract the > > reader with other potential proposals. The proposal as it stands is > > complicated enough. > > Ok. > > > “return ca > a;” > > “return ca > *ap;” > > > > I fail to understand why these examples are present. It’s not > > simply that > > nobody writes code like that: the examples are not on point. I > > would remove > > the entire programs containing them, along with the sections that > > discuss > > them. When writing to the C committee one can assume the reader is > > expert in > > ‘restrict’, there is no need for examples such as these. > > Those are examples of how consistent use of restrict can --or could, > in > the case of g()-- detect, via compiler diagnostics, (likely) > violations > of seemingly unrelated parts of the standard, such as the referenced > C11::6.5.16.1p3, or in this case, C11::6.5.8p5. > > > “strtol(3) accepts 4 objects via pointer parameters and global > > variables.” > > > > Omit the “(3)”, here and elsewhere, as the audience is the C > > standard > > committee. > > The C standard committee doesn't know about historic use of (3)? > That > predates the standard, and they built on top of that (C originated in > Unix). While they probably don't care about it anymore, I expect my > paper to be read by other audience, including GCC and glibc, and I > prefer to keep it readable for that audience. I expect the standard > committee to at least have a rough idea of the existence of this > syntax, > and respect it, even if they don't use it or like it. > > > “accepts” is a strange word to use here: normally one says > > “accepts” to talk > > about parameters, not global variables. > > The thing is, strtol(3) does not actually access *endptr. I thought > that might cause more confusion than using "accepts". > > > Also, “global variables” is not > > right here. The C standard allows strtol, for example, to read and > > write an > > internal static cache. (Yes, that would be weird, but it’s > > allowed.) > > That's not part of the API. A user must not access internal static > cache, and so the implementation is free to assume that it doesn't, > regardless of the use of restrict in the API, so it is not relevant > for > the purpose of this discussion, I think. > > > I > > suggest rephrasing this sentence to talk about accessing, not > > accepting. > > I don't want to use accessing, for it would be inconsistent with > later > saying that *endptr is not accessed. However, I'm open to other > suggested terms that might be more appropriate than both. > > > “endptr access(write_only) ... *endptr access(none)” > > > > This is true for glibc, but it’s not necessarily true for all > > conforming > > strtol implementations. If endptr is non-null, a conforming strtol > > implementation can both read and write *endptr; > > It can't, I think. It's perfectly valid to pass an uninitialized > endptr, which means the callee must not read the original value. > > char *end; > strtol("0", &end, 0); > > If strtol(3) would be allowed to read it, the user would need to > initialize it. > > > it can also both read and > > write **endptr. (Although it would need to write before reading, > > reading is > > allowed.) > > Here, we need to consider two separate objects. The object pointed- > to > by *endptr _before_ the object pointed to by endptr is written to, > and > the object pointed-to by *endptr _after_ the object pointed to by > endptr > is written to. > > For the former (the original *endptr): > > Since *endptr might be uninitialized, strtol(3) must NOT > access > the object pointed to by an uninitialized pointer. > > For the latter (the final *endptr): > > The callee cannot write to it, since the specification of the > function is that the string will not be modified. And in any > case, such an access is ultimately derived from nptr, not > from > *endptr, so it does not matter for the discussion of *endptr. > > Of course, that's derived from the specification of the function, and > not from its prototype, since ISO C doesn't provide such detailed > prototypes (since it doesn't have the [[gnu::access()]] attribute). > But > the standard must abide by its own specification of functions, > anyway. > > > “This qualifier helps catch obvious bugs such as strtol(p, p, 0) > > and > > strtol(&p, &p, 0) .” > > > > No it doesn’t. Ordinary type checking catches those obvious bugs, > > and > > ‘restrict’ provides no additional help there. Please complicate the > > examples > > to make the point more clearly. > > To be pedantic, I didn't specify the type of p, so it might be (void > *), > and thus avoid type checking at all. However, to avoid second > guessing > from the standards committee, I'll add casts, to make it more obvious > that restrict is catching those. > > > “The caller knows that errno doesn’t alias any of the function > > arguments.” > > > > Only because all args are declared with ‘restrict’. So if the > > proposal is > > accepted, the caller doesn’t necessarily know that. > > Not really. The caller has created the string (or has received it > via a > restricted pointer), and so it knows it's not derived from errno. > > char buf[LINE_MAX + 1]; > > fgets(...); > n = strtol(buf, ...); > > This caller knows with certainty that errno does not alias buf. Of > course, in some complex cases, it might not know, but I ommitted that > for simplicity. And in any case, I don't think any optimizations are > affected by that in the caller. > > > > > > > “The callee knows that *endptr is not accessed.” > > > > This is true for glibc, but not necessarily true for every > > conforming strtol > > implementation. > > The original *endptr may be uninitialized, and so must not be > accessed. > > > “It might seem that it’s a problem that the callee doesn’t know if > > nptr can > > alias errno or not. However, the callee will not write to the > > latter > > directly until it knows it has failed,” > > > > Again this is true for glibc, but not necessarily true for every > > conforming > > strtol implementation. > > An implementation is free to set errno = EDEADLK in the middle of it, > as > long as it later removes that. However, I don't see how it would > make > any sense. > > If that's done, it's probably done via a helper internal function, > which > as said below, can use restrict for nptr, and thus know with > certainty > that nptr is distinct from errno. > > If that's done directly in the body of strtol(3) (the only place > where > it's not known that nptr is distinct from errno) we can probably > agree > that the implementation is doing that just for fun, and doesn't care > about optimization, and thus we can safely ignore it. > > > To my mind this is the most serious objection. The current standard > > prohibits calls like strtol((char *) &errno, 0, 0). The proposal > > would relax > > the standard to allow such calls. In other words, the proposal > > would > > constrain implementations to support such calls. > > I don't think it does. ISO C specifies that strtol(3) takes a string > as > its first parameter, and errno is not (unless you do this:). > > (char *)&errno = "111"; > > Okay, let's assume you're allowed to do that, since a char* can alias > anything. > > I still don't think ISO C constrains implementations to allow passing > (char *)&errno as a char*, just because it's not restrict. Let's > find > an ISO C function that accepts a non-restrict string: > > int system(const char *string); > > Does ISO C constrain implementations to support system((char > *)&errno)? > I don't think so. Maybe it does implicitly because of a defect in > the > wording, but even then it's widely understood that it doesn't. > > > Why is this change worth > > making? Real-world programs do not make calls like that. > > Because it makes analysis of 'restrict' more consistent. The obvious > improvement of GCC's analyzer to catch restrict violations will > trigger > false positives in normal uses of strtol(3). Hi Alejandro I'm author/maintainer of GCC's -fanalyzer option, which is presumably why you CCed me on this. One of my GSoC 2022 students (Tim Lange) looked at making use of 'restrict' in -fanalyzer, see e.g. https://lists.gnu.org/archive/html/bug-gnulib/2022-07/msg00062.html Based on Paul's comment here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99860#c2 (and its references) I came to the conclusion at the time that we should work on something else, as the meaning of 'restrict' is too ambiguous. Later, I added a new -Wanalyzer-overlapping-buffers warning in GCC 14, which simply has a hardcoded set of standard library functions that it "knows" to warn about. Has the C standard clarified the meaning of 'restrict' since that discussion? Without that, I wasn't planning to touch 'restrict' in GCC's -fanalyzer. Sorry if I'm missing anything here; I confess I've skimmed through parts of this thread. Dave > > > “But nothing prohibits those internal helper functions to specify > > that nptr > > is restrict and thus distinct from errno.” > > > > Although true, it’s also the case that the C standard does not > > *require* > > internal helper functions to use ‘restrict’. All that matters is > > the > > accesses. So I’m not sure what the point of this statement is. > > If an implementation wants to optimize, it should be careful and use > restrict. If it doesn't, then it can't complain that ISO C doesn't > allow it to. It's actually allowed to optimize, but it has to do > some > work for it. > > > “m = strtol(p, &p, 0); An analyzer more powerful than the current > > ones > > could extend the current -Wrestrict diagnostic to also diagnose > > this case.” > > > > Why would an analyzer want to do that? This case is a perfectly > > normal thing > > to do and it has well-defined behavior. > > Because without an analyzer, restrict cannot emit many useful > diagnostics. It's a qualifier that's all about data flow analysis, > and > normal diagnostics aren't able to do that. > > A qualifier that enables optimizations but doesn't enable diagnostics > is > quite dangerous, and probably better not used. If however, the > analyzer > emits advanced diagnostics for misuses of it, then it's a good > qualifier. > > Have a lovely day! > Alex > > > > > “To prevent triggering diagnostics in a powerful analyzer that > > would be > > smart enough to diagnose the example function g(), the prototype of > > strtol(3) should be changed to ‘long int strtol(const char *nptr, > > char > > **restrict endptr, int base);’” > > > > Sorry, but the case has not been made to make any such change to > > strtol’s > > prototype. On the contrary, what I’m mostly gathering from the > > discussion is > > that ‘restrict’ can be confusing, which is not news. > > > > n3220 §6.7.4.2 examples 5 through 7 demonstrate that the C > > committee has > > thought through the points you’re making. (These examples were not > > present > > in C11.) This may help to explain why the standard specifies strtol > > with > > ‘restrict’ on both arguments. > > >