Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element

René Scharfe <l.s.r@xxxxxx> · Thu, 7 Jul 2022 21:11:28 +0200

Am 07.07.22 um 20:10 schrieb Junio C Hamano:
> While our eyes are on array.cocci, I have a few observations on it.
>
> This is not meant specifically to you, Ævar, but comments by those
> more familiar with Coccinelle (and I am sure the bar to pass is
> fairly low, as I am not all that familiar) are very much
> appreciated.
>
>     @@
>     expression dst, src, n, E;
>     @@
>       memcpy(dst, src, n * sizeof(
>     - E[...]
>     + *(E)
>       ))
>
> This seems to force us prefer sizeof(*(E)) over sizeof(E[i]), when
> it is used to compute the byte size of memcpy() operation.  There is
> no reason to prefer one over the other, but I presume it is there
> only for convenience for the other rules in this file (I recall
> vaguely reading somewhere that these rules do not "execute" from top
> to bottom, so I wonder how effective it is?).

It halves the number of syntax variants to deal with.

>
>     @@
>     type T;
>     T *ptr;
>     T[] arr;
>     expression E, n;
>     @@
>     (
>       memcpy(ptr, E,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(arr, E,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, ptr,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, arr,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     )
>
> Likewise, but this one is a lot worse.
>
> Taken alone, sizeof(*(ptr)) is far more preferrable than sizeof(T),
> because the code will be more maintainable.
>
>     Side Note.  I know builtin/mv.c had this type mismatch between
>     the variable and sizeof() from the beginning when 11be42a4 (Make
>     git-mv a builtin, 2006-07-26) introduced both the variable
>     declaration "const char **source" and memmove() on it, but a
>     code that starts out with "char **src" with memmove() that moves
>     part of src[] and uses sizeof(char *) to compute the byte size
>     of the move would become broken the same way when a later
>     developer tightens the declaration to use "const char **src"
>     without realizing that they have to update the type used in
>     sizeof().
>
> So even though I am guessing that this is to allow the later rules
> to worry only about sizeof(T), I am a bit unhappy to see the rule.
> If an existing code matched this rule and get rewritten to use
> sizeof(T), not sizeof(*(ptr)), but did not match the other rules to
> be rewritten to use COPY_ARRAY(), the overall effect would be that
> the automation made the code worse.

True.

>
>     @@
>     type T;
>     T *dst_ptr;
>     T *src_ptr;
>     T[] dst_arr;
>     T[] src_arr;
>     expression n;
>     @@
>     (
>     - memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_ptr, n)
>     |
>     - memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_arr, n)
>     |
>     - memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_ptr, n)
>     |
>     - memcpy(dst_arr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_arr, n)
>     )
>
> I take it that thanks to the earlier "meh -- between sizeof(*p) and
> sizeof(p[0]) there is no reason to prefer one over the other" and
> "oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this
> one is the other way around" rules, this one only has to deal with
> sizeof(T).
>
> Am I reading it correctly?

Yes.  Without the ugly normalization step in the middle could either
use twelve cases instead of four here or use inline alternatives,
e.g.:

type T;
T *dst_ptr;
T *src_ptr;
T[] dst_arr;
T[] src_arr;
expression n;
@@
(
- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_ptr, n)
|
- memcpy(dst_ptr, src_arr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_arr, n)
|
- memcpy(dst_arr, src_ptr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_ptr, n)
|
- memcpy(dst_arr, src_arr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_arr, n)
)

I seem to remember that rules like this missed some cases, but perhaps
that's no longer an issue with the latest Coccinelle version?

>
>     @@
>     type T;
>     T *dst;
>     T *src;
>     expression n;
>     @@
>     (
>     - memmove(dst, src, (n) * sizeof(*dst));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(*src));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(T));
>     + MOVE_ARRAY(dst, src, n);
>     )
>
> What I find interesting is that this one seems to be able to do the
> necessary rewrite without having to do the "turn everything into
> sizeof(T) first" trick.  If this approach works well, I'd rather see
> the COPY_ARRAY() done without the first two preliminary rewrite
> rules.

It doesn't support arrays (T[]).  That doesn't matter in practice
because we don't have such cases (yet?).

>
> I wonder if the pattern in the first rule catches sizeof(dst[0])
> instead of sizeof(*dst), though.

It doesn't.

>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(*ptr));
>     + ALLOC_ARRAY(ptr, n);
>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(T));
>     + ALLOC_ARRAY(ptr, n);
>
> Is it a no-op rewrite if we replace the above two rules with
> something like:
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   (
> .   - ptr = xmalloc((n) * sizeof(*ptr));
> .   + ALLOC_ARRAY(ptr, n);
> .   |
> .   - ptr = xmalloc((n) * sizeof(T));
> .   + ALLOC_ARRAY(ptr, n);
> .   )

I think so.

>
> or even following the pattern of the next one ...
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   - ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
> .   + ALLOC_ARRAY(ptr, n);
>
> ... I have to wonder?  I like the simplicity of this pattern.

In theory this is equivalent.

>
>     @@
>     type T;
>     T *ptr;
>     expression n != 1;
>     @@
>     - ptr = xcalloc(n, \( sizeof(*ptr) \| sizeof(T) \) )
>     + CALLOC_ARRAY(ptr, n)
>
> And this leaves xcalloc(1, ...) alone because it is a way to get a
> cleared block of memory that may not be an array at all.  Shouldn't
> we have "n != 1" for xmalloc rule as well, I wonder, if only for
> consistency?

I agree that a single-element array is a bit awkward, so allowing the
explicit sizeof in that case is less iffy.  ALLOC/CALLOC macros for
single items might make that automation more palatable..

René