Re: [PATCH v2 11/17] mktree: overwrite duplicate entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Victoria Dye <vdye@xxxxxxxxxx>
>
> If multiple tree entries with the same name are provided as input to
> 'mktree', only write the last one to the tree. Entries are considered
> duplicates if they have identical names (*not* considering mode); if a blob
> and a tree with the same name are provided, only the last one will be
> written to the tree. A tree with duplicate entries is invalid (per 'git
> fsck'), so that condition should be avoided wherever possible.

The "should be avoided" in the last sentence can be satisified
either by the callers being extra careful, or the callee ignoring
earlier entries with the same path.  I do not have a strong
objection against allowing looser callers, but if that is what is
going on, perhaps

	By teaching "mktree" to ignore the earlier entries for the
        same path in the input, the callers can be more casual about
        sending duplicate entries in order to avoid creating an
        invalid tree objects.

is a more honest justification for this setp?

> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 5f3a6dfe38e..cf1fd82f754 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -54,7 +54,8 @@ cannot be represented in a tree object. The command will fail without
>  writing the tree if a higher order stage is specified for any entry.
>  
>  The order of the tree entries is normalized by `mktree` so pre-sorting the
> -input by path is not required.
> +input by path is not required. Multiple entries provided with the same path
> +are deduplicated, with only the last one specified added to the tree.

OK.

>  struct tree_entry {
> +	/* Internal */
> +	size_t order;
> +
>  	unsigned mode;
>  	struct object_id oid;
>  	int len;
> @@ -74,15 +77,49 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
>  	ent->len = len;
>  	oidcpy(&ent->oid, oid);
>  
> +	ent->order = arr->nr;
>  	tree_entry_array_push(arr, ent);
>  }
>  
> -static int ent_compare(const void *a_, const void *b_)
> +static int ent_compare(const void *a_, const void *b_, void *ctx)
>  {
> +	int cmp;
>  	struct tree_entry *a = *(struct tree_entry **)a_;
>  	struct tree_entry *b = *(struct tree_entry **)b_;
> -	return base_name_compare(a->name, a->len, a->mode,
> -				 b->name, b->len, b->mode);
> +	int ignore_mode = *((int *)ctx);
> +
> +	if (ignore_mode)
> +		cmp = name_compare(a->name, a->len, b->name, b->len);
> +	else
> +		cmp = base_name_compare(a->name, a->len, a->mode,
> +					b->name, b->len, b->mode);
> +	return cmp ? cmp : b->order - a->order;
> +}

Having two similar functions that could go out of sync has bothered
me somewhat.  We could instead do

	int a_mode = ignore_mode ? 0 : a->mode;
	int b_mode = ignore_mode ? 0 : b->mode;
	cmp = base_name_compare(a->name, a->len, a_mode,
				b->name, b->len, b_mode);

but that should be done by rewriting name_compare() in terms of
base_name_compare(), which will help more callers, not just this
one.

> +static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> +{
> +	size_t count = arr->nr;
> +	struct tree_entry *prev = NULL;
> +
> +	int ignore_mode = 1;
> +	QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);

Swap the decl for ignore_mode and the blank line above it?

If the callback context only needs a single bit, ent_compare() could
just use the NULL-ness of ctx as "do we want to ignore mode?" bit.

> +	arr->nr = 0;
> +	for (size_t i = 0; i < count; i++) {
> +		struct tree_entry *curr = arr->entries[i];
> +		if (prev &&
> +		    !name_compare(prev->name, prev->len,
> +				  curr->name, curr->len)) {
> +			FREE_AND_NULL(curr);
> +		} else {
> +			arr->entries[arr->nr++] = curr;
> +			prev = curr;
> +		}
> +	}

As long as this is done for a single tree (i.e. the paths do not
have any slashes in them), this "sort them all and keep the last
one" is a good strategy.

> +	/* Sort again to order the entries for tree insertion */
> +	ignore_mode = 0;
> +	QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);

OK.  We from time to time find need to do this, and I always regret
that we didn't design the sort order of paths in a tree (and in the
index) like so [*].  But that is almost 20 years too late ;-).

Looking good.


[Footnote]

 * A directory entry $T should have sorted after a non-directory
   entry $T but before any non-directory entry whose path has $T
   as its prefix (e.g. even a blob whose path is $T + "\001" should
   sort after a tree $T).  That way we didn't have to worry about a
   blob at ($T + '-') sorting before a tree at $T but a blob at ($T
   + '0') sorting after that tree.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux