Re: [PATCH v2 16/17] mktree: allow deeper paths in input

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Victoria Dye <vdye@xxxxxxxxxx>
>
> Update 'git mktree' to handle entries nested inside of directories (e.g.
> 'path/to/a/file.txt'). This functionality requires a series of changes:
>
> * In 'sort_and_dedup_tree_entry_array()', remove entries inside of
>   directories that come after them in input order.

So if you feed "folder/file.txt" and then "folder/", then
"folder/file.txt" gets removed?  It is unclear offhand why that is
the right thing to do.

> * Also in 'sort_and_dedup_tree_entry_array()', mark directories that contain
>   entries that come after them in input order (e.g., 'folder/' followed by
>   'folder/file.txt') as "need to expand".

Makes me wonder what happens to the object name recorded in the
input for "folder/" when something like this happens.  Ideally,
adding (or replacing) "folder/file.txt" to the set of files we
collected out of the base tree and the input stream for "folder/"
and writing that "folder/" out as a tree would result in a tree
whose object name exactly matches it (and we will error out if it
does not)?  Or is "need to expand" a signal that we should ignore
the object name in the input and we need to recompute it ourselves?
Again, it is unclear offhand what we want the "need to expand" is
used for.

> * In 'add_tree_entry_to_index()', if a tree entry is marked as "need to
>   expand", recurse into it with 'read_tree_at()' & 'build_index_from_tree'.
> * In 'build_index_from_tree()', if a user-specified tree entry is contained
>   within the current iterated entry, return 'READ_TREE_RECURSIVE' to recurse
>   into the iterated tree.

Surely, no matter what we choose to do to the object name given with
"folder/" when the input stream also talks about "folder/file.txt",
we'd need to recurse into the subtree.  But I think we need a higher
level description of what exactly we want to do to the multi-level
pathnames (i.e., "we want to handle them this way") before going into
the implementation details of how we do so (i.e., "hence we deal with
a multi-level pathname this way at these places in the code") in
these bullet points.

Especially, I do not quite understand what semantics the first
bullet point is trying to achieve.

> +Entries may use full pathnames containing directory separators to specify
> +entries nested within one or more directories. These entries are inserted
> +into the appropriate tree in the base tree-ish if one exists. Otherwise,
> +empty parent trees are created to contain the entries.

This still does not answer "how overlapping entries are handled?",
which is more complex than "for two exactly the same paths, the last
one wins", which is mentioned in the next paragraph.

>  The order of the tree entries is normalized by `mktree` so pre-sorting the
>  input by path is not required. Multiple entries provided with the same path
>  are deduplicated, with only the last one specified added to the tree.
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 96f06547a2a..74cec92a517 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> ...
> +static struct tree_entry *tree_entry_array_pop(struct tree_entry_array *arr)
> +{
> +	if (!arr->nr)
> +		return NULL;
> +	return arr->entries[--arr->nr];
> +}
> +
>  static void tree_entry_array_clear(struct tree_entry_array *arr, int free_entries)
>  {
>  	if (free_entries) {
> @@ -109,8 +118,10 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
>  
>  		if (!verify_path(ent->name, mode))
>  			die(_("invalid path '%s'"), path);
> -		if (strchr(ent->name, '/'))
> -			die("path %s contains slash", path);
> +
> +		/* mark has_nested_entries if needed */
> +		if (!arr->has_nested_entries && strchr(ent->name, '/'))
> +			arr->has_nested_entries = 1;

OK.

> @@ -168,6 +179,46 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
>  	ignore_mode = 0;
>  	QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);

We have already sorted the array twice (once before simple deduping,
once after).  So we now have a sorted array of "last one won" paths
and their object names.

> +	if (arr->has_nested_entries) {

We need to deal with overlapping entries if "has-nested-entries" is
true.  Even though our input here is sorted, we'd still pay
attention to the original input "order", which may be different from
the order in which we find these entries in arr->entries[].

OK.

> +		struct tree_entry_array parent_dir_ents = { 0 };
> +
> +		count = arr->nr;
> +		arr->nr = 0;
> +
> +		/* Remove any entries where one of its parent dirs has a higher 'order' */

Is "has a higher order" equivalent to "appears later in the input"?

More importantly, can the reason why they need to be removed be
clarified?  For simple deduping, we can say "we will make the last
one of multiple entries talking about the same path be used", and
that would be a sufficient explanation why we discard the one that
we have seen earlier and replace it with the newly seen one for the
same path.  Can a similar and simple explanation be given for the
behaviour this loop tries to achieve?  Is it "children, which appear
earlier in the input, of a directory, which appears later than these
children, are discarded, because the entry for the directory has a
concrete object name, and there is no point talking about individual
paths inside the directory.  We know what the tree object that would
contain these child paths hashes to in the end.  This is a natural
extension of 'last one wins' rule---a directory that comes later
trumps paths contained within that come earlier"?

> +		for (size_t i = 0; i < count; i++) {
> +			const char *skipped_prefix;
> +			struct tree_entry *parent;
> +			struct tree_entry *curr = arr->entries[i];
> +			int skip_entry = 0;
> +
> +			while ((parent = tree_entry_array_pop(&parent_dir_ents))) {
> +				if (!skip_prefix(curr->name, parent->name, &skipped_prefix))
> +					continue;
> +
> +				/* entry in dir, so we push the parent back onto the stack */
> +				tree_entry_array_push(&parent_dir_ents, parent);
> +
> +				if (parent->order > curr->order)
> +					skip_entry = 1;
> +				else
> +					parent->expand_dir = 1;
> +
> +				break;
> +			}
> +
> +			if (!skip_entry) {
> +				arr->entries[arr->nr++] = curr;
> +				if (S_ISDIR(curr->mode))
> +					tree_entry_array_push(&parent_dir_ents, curr);
> +			} else {
> +				FREE_AND_NULL(curr);
> +			}
> +		}
> +
> +		tree_entry_array_release(&parent_dir_ents, 0);
> +	}
> +
>  	/* Finally, initialize the directory-file conflict hash map */
>  	for (size_t i = 0; i < count; i++) {
>  		struct tree_entry *curr = arr->entries[i];




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux