Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

Alejandro Colomar <alx.manpages@xxxxxxxxx> · Sun, 13 Nov 2022 14:19:24 +0100

Hi Martin!

On 11/12/22 16:56, Martin Uecker wrote:
Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:

Since it's to be used as an rvalue, not as a lvalue, I guess a
postfix-expression wouldn't be the right one.

Several forms of postfix-expression are only rvalues.

(with a special rule about how the identifier is interpreted, different
from the normal scope rules)?  If so, then ".a = 1" could either match
assignment-expression directly (assigning to the postfix-expression ".a").

No, assigning to a function parameter from within another parameter
declaration wouldn't make sense.  They should be readonly.  Side effects
should be forbidden, I think.

Such assignments are already allowed.  In a function definition, the side
effects (including in size expressions for array parameters adjusted to
pointers) take place before entry to the function body.

And, in any case, if you did have a constraint disallowing such
assignments, it wouldn't suffice for syntactic disambiguation (see the
previous point I made about that; I have some rough notes towards a WG14
paper on syntactic disambiguation, but haven't converted them into a
coherent paper).

My idea was to only allow

array-declarator : direct-declarator [ . identifier ]

and only for parameter (not nested inside structs declared
in parameter list) as a first step because it seems this
would exclude all difficult cases.

But if we need to allow more complicated expressions, then
it starts getting more complicated.

Ahh, I guess my work in documenting the man-pages prototypes got me thinking of 
those extensions to the idea.  I don't remember all the details :)

One could could allow more generic expressions, and
specify that the .identifier refers to a
parameter in
the nearest lexically enclosing parameter list or
struct/union.

Then

void foo(struct bar { int x; char c[.x] } a, int x);

would not be allowed (which is good because then we
could later use the syntax also inside structs). If
we apply scoping rules, the following would work:

struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);

Makes sense.

But not:

struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);

Although it clearly is nonsense, I'm not sure I'd make it a constraint 
violation, but rather Undefined Behavior.  How is it different than this?:

$ cat foo.c
int main(void)
{
	int i = i;
	return i;
}

$ gcc --version | head -n1
gcc (Debian 12.2.0-9) 12.2.0
$ gcc -Wall -Wextra -Werror foo.c
$

$ clang --version | head -n1
Debian clang version 14.0.6
$ clang -Wall -Wextra -Werror foo.c
foo.c:3:10: error: variable 'i' is uninitialized when used within its own 
initialization [-Werror,-Wuninitialized]
        int i = i;
            ~   ^
1 error generated.

BTW, I just freaked out that GCC can't catch this trivial bug.  Should I open a 
bug report?

But there are not only syntactical problems, because
also the type of the parameter might become relevant
and then you can get circular dependencies:

void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);

This seems to be a difficult stone in the road.

I am not sure what would the best way to fix it. One
could specifiy that parameters referred to by
the .identifer syntax must of some integer type and
that the sub-expression .identifer is always
converted to a 'size_t'.

That makes sense, but then overnight some quite useful thing came to my mind 
that would not be possible with this limitation:

<https://software.codidact.com/posts/285946>

char *
stpecpy(char dst[.end - .dst], char *src, char end[1])
{
	for (/* void */; dst <= end; dst++) {
		*dst = *src++;
		if (*dst == '\0')
			return dst;
	}
	/* Truncation detected */
	*end = '\0';

#if !defined(NDEBUG)
	/* Consume the rest of the input string. */
	while (*src++) {};
#endif

	return end + 1;
}

stpecpy() is a function similar to strlcat(3) that gets a pointer to the end of 
the array instead of the size of the buffer.  This allows chaining without 
having performance issues[1].

[1]: <https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm>

Maybe allowing integral types and pointers would be enough.  However, foreseeing 
that the _Lengthof() proposal (BTW, which paper was it?) will succeed, and 
combining it with this one, _Lengthof(pointer) would ideally give the length of 
the array, so allowing pointers would conflict.

My solution is to disallow sizeof() and _Lengthof() on .identifier.  That could 
be done simply by saying that variably-modified types (VMT) are incomplete types 
until immediately after the comma that follows the parameter declaration. 
Therefore it would be allowed only in the same way as it is allowed right now 
with the normal syntax (i.e., after the parameter has been seen).

BTW, what was the number of the latest paper for _Lengthof() and what happened 
to it?  I guess it's likely to be added to C3x, isn't it?

And another BTW:  there's some kind of consistency in (some) projects for naming 
sizes, and I have pending a review of the Linux man-pages to make it consistent 
there too.

See the following table of usual conventions:

Operator/macro:                 variable names;    Description.
------------------------------|------------------|---------------------
strlen(3):                      length, len, l;    String length.
sizeof():                       size, sz, nbytes;  Identifier size in bytes.
nitems(), nelems():             n, nelem, nitems;  Array number of elements.
sizeof_array(), array_bytes():  size, sz, nbytes;  Array size in bytes.

Naming _Lengthof() the operator that gets the number of elements in an array 
would create naming confusion, since then length can mean two different things. 
I suggest _Nitemsof().

Maybe one should also add a constraint that all new
type length expressions, i.e. using the syntax,
can not have side effects. Or even that they follow
all the rules of integer constant expressions with
the fictitious assumption that all . identifer
sub-expressions are integer constant expressions.
The rationale being that this would facilitate
compile time reasoning about length expressions.

Martin

Cheers,

Alex

--
<http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature