Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element

René Scharfe <l.s.r@xxxxxx> · Thu, 7 Jul 2022 20:27:44 +0200

Am 07.07.22 um 14:11 schrieb Ævar Arnfjörð Bjarmason:
>
> On Wed, Jul 06 2022, Junio C Hamano wrote:
>
>> The variables 'source', 'destination', and 'submodule_gitfile' are
>> all of type "const char **", and an element of such an array is of
>> "type const char *".
>>
>> Noticed while running "make contrib/coccinelle/array.cocci.patch".
>>
>> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
>> ---
>>
>>  * There is this rule in the array.cocci file:
>>
>>         @@
>>         type T;
>>         T *dst;
>>         T *src;
>>         expression n;
>>         @@
>>         (
>>         - memmove(dst, src, (n) * sizeof(*dst));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(*src));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(T));
>>         + MOVE_ARRAY(dst, src, n);
>>         )
>>
>>    but it triggered only for modes[] array among the four parallel
>>    arrays that are being moved here.
>>
>>    There are a few tangents.
>>
>>    * Should we in general use sizeof(TYPE) in these cases, instead
>>      of the size of the zeroth element, e.g.
>>
>>  		memmove(source + i, source + i + 1,
>> 			n * sizeof(source[i]));
>>
>>      It would have been caught by the above Coccinelle rule (we are
>>      taking the size of *dst).

You mean using normalization rules for memmove like we have for memcpy,
to cover a wider range of combinations without having explicit rules for
each one?  Makes sense -- memcpy and memmove take the same arguments, so
we should treat them the same.

>>
>>    * Shouldn't we have an array of struct with four members, instead
>>      of four parallel arrays, e.g.
>>
>> 		struct {
>> 			const char *source;
>> 			const char *destination;
>> 			enum update_mode mode;
>> 			const char *submodule_gitfile;
>> 		} *mv_file;

Possibly, but that would require restructuring the code.  Currently the
function internal_prefix_pathspec() is used to build the string arrays
source and destination separately.

>>
>>    The latter question is important to answer before we accept
>>    Coccinelle-suggested rewrite to use four MOVE_ARRAY() on these
>>    four parallel arrays on top of this fix.
>
> This seems to be the same sort case as noted in 7bd97d6dff3 (git: use
> COPY_ARRAY and MOVE_ARRAY in handle_alias(), 2019-09-19).
>
> I tried (going aginst the warnings in that commit about the
> non-generality) updating the rules to catch these cases, which yielded
> the below. I wonder if we should apply some version of it. I.e. one-off
> fix these, and perhaps have the cocci rule BUG() if we see this sort of
> code again (the form technically could be used, but it seems all our
> uses of it are ones we could convert to the simpler one...).

Having a loose rule and inspecting its results carefully can work, but
having such a rule published and run by CI pipelines etc. risks adding
defects and false positives.  I'm not sure it's much of a win to come
up with a loose Coccinelle rule instead of grepping and hand-editing if
great care needs to be taken in both cases.

>
> diff --git a/add-patch.c b/add-patch.c
> index 509ca04456b..eff338d3901 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -915,10 +915,9 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  	file_diff->hunk_nr += splittable_into - 1;
>  	ALLOC_GROW(file_diff->hunk, file_diff->hunk_nr, file_diff->hunk_alloc);
>  	if (hunk_index + splittable_into < file_diff->hunk_nr)
> -		memmove(file_diff->hunk + hunk_index + splittable_into,
> -			file_diff->hunk + hunk_index + 1,
> -			(file_diff->hunk_nr - hunk_index - splittable_into)
> -			* sizeof(*hunk));
> +		MOVE_ARRAY(file_diff->hunk + hunk_index + splittable_into,
> +			   file_diff->hunk + hunk_index + 1,
> +			   file_diff->hunk_nr - hunk_index - splittable_into);

OK.

>  	hunk = file_diff->hunk + hunk_index;
>  	hunk->splittable_into = 1;
>  	memset(hunk + 1, 0, (splittable_into - 1) * sizeof(*hunk));
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 83a465ba831..f6187d1264a 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -282,14 +282,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  remove_entry:
>  		if (--argc > 0) {
>  			int n = argc - i;
> -			memmove(source + i, source + i + 1,
> -				n * sizeof(char *));
> -			memmove(destination + i, destination + i + 1,
> -				n * sizeof(char *));
> -			memmove(modes + i, modes + i + 1,
> -				n * sizeof(enum update_mode));
> -			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
> -				n * sizeof(char *));
> +			MOVE_ARRAY(source + i, source + i + 1, n);
> +			MOVE_ARRAY(destination + i, destination + i + 1, n);
> +			MOVE_ARRAY(modes + i, modes + i + 1, n);
> +			MOVE_ARRAY(submodule_gitfile + i,
> +				   submodule_gitfile + i + 1, n);

A simpler approach could be to mark the entries as invalid somehow and
skip them in the final loop instead of compacting the array like that.
We don't shrink the allocation anyway, so there's no need to move half
the entries on average on each such a delete.

>  			i--;
>  		}
>  	}
> diff --git a/commit.c b/commit.c
> index 1fb1b2ea90c..c23d3e3678f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -151,10 +151,9 @@ int register_commit_graft(struct repository *r, struct commit_graft *graft,
>  		   r->parsed_objects->grafts_alloc);
>  	r->parsed_objects->grafts_nr++;
>  	if (pos < r->parsed_objects->grafts_nr)
> -		memmove(r->parsed_objects->grafts + pos + 1,
> -			r->parsed_objects->grafts + pos,
> -			(r->parsed_objects->grafts_nr - pos - 1) *
> -			sizeof(*r->parsed_objects->grafts));
> +		MOVE_ARRAY(r->parsed_objects->grafts + pos + 1,
> +			   r->parsed_objects->grafts + pos,
> +			   r->parsed_objects->grafts_nr - pos - 1);

Is that -1 correct?  Yes, it's needed because grafts_nr is incremented
already above, before making room and actually adding the the new item.
Confused me for a minute.

>  	r->parsed_objects->grafts[pos] = graft;
>  	unparse_commit(r, &graft->oid);
>  	return 0;
> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 9a4f00cb1bd..988ff9a3fda 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -73,6 +73,15 @@ expression n;
>  + MOVE_ARRAY(dst, src, n);
>  )
>
> +@@
> +expression D;
> +expression S;
> +expression N;
> +expression Z;
> +@@
> +- memmove(D, S, (N) * Z);
> ++ MOVE_ARRAY(D, S, N);
> +
>  @@
>  type T;
>  T *ptr;