Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

Martin Uecker <uecker@xxxxxxxxx> · Sat, 03 Sep 2022 14:47:00 +0200

...
> > 
> >        Whether or not you feel like the manpages are the best place to 
> > start that, I'll leave up to you!
> 
> I'll try to defend the reasons to start this in the man-pages.
> 
> This feature is mostly for documentation purposes, not being meaningful 
> for code at all (for some meaning of meaningful), since it won't change 
> the function definition in any way, nor the calls to it.  At least not 
> by itself; static analysis may get some benefits, though.

GCC will warn if the bound is specified inconsistently between
declarations and also emit warnings if it can see that a buffer
which is passed is too small:

https://godbolt.org/z/PsjPG1nv7

BTW: If you declare pointers to arrays (not first elements) you
can get run-time bounds checking with UBSan:

https://godbolt.org/z/TvMo89WfP

> 
> Also, new code can be designed from the beginning so that sizes go 
> before their corresponding arrays, so that new code won't typically be 
> affected by the lack of this feature in the language.
> 
> This leaves us with legacy code, especially libc, which just works, and 
> doesn't have any urgent needs to change their prototypes in this regard 
> (they could, to improve static analysis, but not what we'd call urgent).

It would be useful step to find out-of-bounds problem in
applications using libc.

> And since most people don't go around reading libc headers searching for 
> function declarations (especially since there are manual pages that show 
> them nicely), it's not like the documentation of the code depends on how 
> the function is _actually_ declared in code (that's why I also defended 
> documenting restrict even if glibc wouldn't have cared to declare it), 
> but it depends basically on what the manual pages say about the 
> function.  If the manual pages say a function gets 'restrict' params, it 
> means it gets 'restrict' params, no matter what the code says, and if it 
> doesn't, the function accepts overlapping pointers, at least for most of 
> the public (modulo manual page bugs, that is).
> 
> So this extension could very well be added by the manual pages, as a 
> form of documentation, and then maybe picked up by compilers that have 
> enough resources to implement it.
> 
> 
> Considering that this feature is mostly about documentation (and a bit 
> of static analysis too), the documentation should be something appealing 
> to the reader.
> 
> 
> Let's take an example:
> 
> 
>         int getnameinfo(const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char *restrict host, socklen_t hostlen,
>                         char *restrict serv, socklen_t servlen,
>                         int flags);
> 
> and some transformations:
> 
> 
>         int getnameinfo(const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char host[restrict hostlen], socklen_t hostlen,
>                         char serv[restrict servlen], socklen_t servlen,
>                         int flags);
> 
> 
>         int getnameinfo(socklen_t hostlen;
>                         socklen_t servlen;
>                         const struct sockaddr *restrict addr,
>                         socklen_t addrlen,
>                         char host[restrict hostlen], socklen_t hostlen,
>                         char serv[restrict servlen], socklen_t servlen,
>                         int flags);
> 
> (I'm not sure if I used correct GNU syntax, since I never used that 
> extension myself.)
> 
> The first transformation above is non-ambiguous, as concise as possible, 
> and its only issue is that it might complicate the implementation a bit 
> too much.  I don't think forward-using a parameter's size would be too 
> much of a parsing problem for human readers.

I personally find the second form not terrible.  Being
able to read code left-to-right, top-down is helpful in more
complicated examples.

> The second one is unnecessarily long and verbose, and semicolons are not 
> very distinguishable from commas, for human readers, which may be very 
> confusing.
> 
>         int foo(int a; int b[a], int a);
>         int foo(int a, int b[a], int o);
> 
> Those two are very different to the compiler, and yet very similar to 
> the human eye.  I don't like it.  The fact that it allows for simpler 
> compilers isn't enough to overcome the readability issues.

This is true, I would probably use it with a comma and/or
syntax highlighting.

> I think I'd prefer having the forward-using syntax as a non-standard 
> extension --or a standard but optional language feature-- to avoid 
> forcing small compilers to implement it, rather than having the GNU 
> extension standardized in all compilers.

The problems with the second form are:

- it is not 100% backwards compatible (which maybe ok though) as
the semantics of the following code changes:

int n;
int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood
by old compilers when a variable with 'n' is in scope.

- it would generally be fundamentally new to C to have
backwards references and parser might need to be changes
to allow this

- a compiler or tool then has to deal also with ugly
corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

We could consider new syntax such as

int foo(char buf[.n], int n);

Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC
over any alternative.  I would also not mind new syntax, but
then one has to define the rules more precisely to avoid the
aforementioned problems. 

Martin