Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

Alejandro Colomar <alx.manpages@xxxxxxxxx> · Sat, 3 Sep 2022 15:41:15 +0200

Hi Martin,

On 9/3/22 14:47, Martin Uecker wrote:
[...]

GCC will warn if the bound is specified inconsistently between
declarations and also emit warnings if it can see that a buffer
which is passed is too small:

https://godbolt.org/z/PsjPG1nv7

That's very good news!

BTW, it's nice to see that GCC doesn't need 'static' for array 
parameters.  I never understood what the static keyword adds there. 
There's no way one can specify an array size an mean anything other than 
requiring that, for a non-null pointer, the array should have at least 
that size.

BTW: If you declare pointers to arrays (not first elements) you
can get run-time bounds checking with UBSan:

https://godbolt.org/z/TvMo89WfP

Couldn't that be caught at compile time?  n is certainly out of bounds 
always for such an array, since the last element is n-1.

Also, new code can be designed from the beginning so that sizes go
before their corresponding arrays, so that new code won't typically be
affected by the lack of this feature in the language.

This leaves us with legacy code, especially libc, which just works, and
doesn't have any urgent needs to change their prototypes in this regard
(they could, to improve static analysis, but not what we'd call urgent).

It would be useful step to find out-of-bounds problem in
applications using libc.

Yep, it would be very useful for that.  Not urgent, but yes, very useful.

Let's take an example:

         int getnameinfo(const struct sockaddr *restrict addr,
                         socklen_t addrlen,
                         char *restrict host, socklen_t hostlen,
                         char *restrict serv, socklen_t servlen,
                         int flags);

and some transformations:

         int getnameinfo(const struct sockaddr *restrict addr,
                         socklen_t addrlen,
                         char host[restrict hostlen], socklen_t hostlen,
                         char serv[restrict servlen], socklen_t servlen,
                         int flags);

         int getnameinfo(socklen_t hostlen;
                         socklen_t servlen;
                         const struct sockaddr *restrict addr,
                         socklen_t addrlen,
                         char host[restrict hostlen], socklen_t hostlen,
                         char serv[restrict servlen], socklen_t servlen,
                         int flags);

(I'm not sure if I used correct GNU syntax, since I never used that
extension myself.)

The first transformation above is non-ambiguous, as concise as possible,
and its only issue is that it might complicate the implementation a bit
too much.  I don't think forward-using a parameter's size would be too
much of a parsing problem for human readers.

I personally find the second form not terrible.  Being
able to read code left-to-right, top-down is helpful in more
complicated examples.

The second one is unnecessarily long and verbose, and semicolons are not
very distinguishable from commas, for human readers, which may be very
confusing.

         int foo(int a; int b[a], int a);
         int foo(int a, int b[a], int o);

Those two are very different to the compiler, and yet very similar to
the human eye.  I don't like it.  The fact that it allows for simpler
compilers isn't enough to overcome the readability issues.

This is true, I would probably use it with a comma and/or
syntax highlighting.

I think I'd prefer having the forward-using syntax as a non-standard
extension --or a standard but optional language feature-- to avoid
forcing small compilers to implement it, rather than having the GNU
extension standardized in all compilers.

The problems with the second form are:

- it is not 100% backwards compatible (which maybe ok though) as
the semantics of the following code changes:

int n;
int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood
by old compilers when a variable with 'n' is in scope.

Hmmm, this one is serious.  I can't seem to solve it with that syntax.

- it would generally be fundamentally new to C to have
backwards references and parser might need to be changes
to allow this

- a compiler or tool then has to deal also with ugly
corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

We could consider new syntax such as

int foo(char buf[.n], int n);

Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC
over any alternative.  I would also not mind new syntax, but
then one has to define the rules more precisely to avoid the
aforementioned problems.

What about taking something from K&R functions for this?:

int foo(q; w; int a[q], int q, int s[w], int w);

By not specifying the types, the syntax is again short.
This is left-to-right, so no problems with global variables, and no need 
for complex parsers.
Also, by not specifying types, now it's more obvious to the naked eye 
that there's a difference:

          int foo(a; int b[a], int a);
          int foo(int a, int b[a], int o);

What do you think about this syntax?

Thanks,

Alex

--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature