Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

Alex Colomar <alx.manpages@xxxxxxxxx> · Tue, 29 Nov 2022 18:28:51 +0100

Hi Martin and Michael,

On 11/29/22 17:58, Uecker, Martin wrote:

Hi,

Am Dienstag, dem 29.11.2022 um 15:44 +0000 schrieb Michael Matz:
Hey,

On Tue, 29 Nov 2022, Uecker, Martin wrote:

It does not require any changes on how arrays are represented.

As part of VM-types the size becomes part of the type and this
can be used for static or dynamic analysis, e.g. you can
- today - get a run-time bounds violation with the sanitizer:

void foo(int n, char (*buf)[n])
{
   (*buf)[n] = 1;
}

This can already statically analyzed as being wrong, no need for
dynamic checking.

In this toy example, but in general in can be checked
only at run-time by using the information about the
dynamic bound.

What I mean is the checking of the claimed contract.
Above you assure for the function body that buf has n elements.

Yes.

This is also a pre-condition for calling this function and
_that_ can't be checked in all  cases because:

   void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
   void callfoo(char * buf) { foo(10, buf); }

buf doesn't have a known size.

This does not type check.

  And a pre-condition that can't be checked
is no pre-condition at all, as only then it can become a guarantee
for the body.

The example above should look like:

void foo(int n, char (*buf)[n]);

void callfoo(char (*buf)[12]) { foo(10, buf); }

This could be checked by an UB sanitizer as calling
the function with an argument of incompatible type
is UB (but we currently do not do this)

If you think about

void foo(int n, char buf[n]);

void callfoo(char *buf) { foo(10, buf); }

Then you are right that this can not be checked at this
time. But this  does not mean it is useless because we
still can detect inconsistencies in other cases:

void callfoo(int n, char buf[n - 1]) { foo(n, buf); }

We could also - in the future - have a warning about all
situations where bound information is lost, making sure
that preconditions are always checked for people who
consistently use these annotations.

The compiler has no choice than to trust the user that the pre-
condition  for calling foo is fulfilled.  I can see how
being able to just check half  of the contract might be
useful, but if it doesn't give full checking then
any proposal for syntax should be even more obviously
orthogonal than the current one.

Your argument is not clear to me.

For

void foo(int n, char buf[n]);

it semantically has no meaning according to the C standard,
but a compiler could still warn.

Hmm?  Warn about what in this decl?

I meant, we could warn about something like this
because it is likely an error:

void foo(int n, char buf[n])
{
   buf[n] = 1;
}

It could also warn for

void foo(int n, char buf[n]);

int main()
{
     char buf[9];
     foo(buf);
}

You mean if you write 'foo(10,buf)' (the above, as is, is simply a
syntax error for non-matching number of args).  Or was it a mispaste
and you mean  the one from the godbolt link, i.e.:

I meant:

char buf[9];
foo(10, buf);

In fact, it turns out we warn already:

https://godbolt.org/z/qcvsv87Ev

void foo(char buf[10]){ buf[9] = 1; }
int main()
{
     char buf[9];
     foo(buf);
}

?  If so, yeah, we warn already.  I don't think this is an argument
for (or against) introducing new syntax.
...

It is argument for having this syntax, because we could
extend such warning (those we already have and those we
could still add) to more common cases such as

void foo(char buf[.n], size_t n);

In my opinion, this would a huge step forward for
safety of C programs as we already have a lot of
infrastructure for checking bounds.

Of course, the existing GNU extension would achieve
the same thing:

void foo(size_t n; char buf[n], size_t n);

But in general: This feature is useful not only for documentation
but also for analysis.

Which feature we're talking about now?  The ones you used all work
today,
as you demonstrated.  I thought we would be talking about that
".whatever"
syntax to refer to arbitrary parameters, even following ones?  I
think a
disrupting syntax change like that should have a higher bar than "in
some
cases, depending on circumstance, we might even be able to warn".

We can use our existing features and then apply them
to cases where the bound is specified after the pointer,
which is more common in practice.

Yep; basically adding some (not perfect, but some) static analysis to 
many libc function calls.

Also, considering the issues with sizeof and arrays, and the lack of a 
_Nitems() [proposed as _Lengthof()] operator, there's a lot of manual 
work in array (read pointer) parameters.

However, a hypothetical _Nitems() operator could make use of this 
syntactic sugar, and be more useful than just providing static analysis. 
 Using _Nitems() on a VMT (including pointer parameters) could be 
specified to return the number of elements, so I foresee code like:

void foo(int arr[nmemb], size_t nmemb)
{
        // _Nitems() evaluates to nmemb
        for (size_t i = 0; i < _Nitems(arr); i++)
                arr[i] = i;
}

void bar(int arr[])
{
        // Constraint violation
        for (size_t i = 0; i < _Nitems(arr); i++)
                arr[i] = i;
}

This is probably the most useful part of this feature (but admittedly 
it's not only about this feature, or even could be added without this 
feature).

Martin

Cheers,

Alex

--
<http://www.alejandro-colomar.es/>

Attachment:
OpenPGP_signature

Description: OpenPGP digital signature