Re: [PATCH 03/10] unpack-trees: add function to update ce_flags based on sparse patterns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyán ThÃi Ngác Duy wrote:

> The function will reconstruct directory structure from index and feed
> excluded_from_list() with proper dtype.

Okay, so I guess what this means is:

 Matching index entries against an excludes file currently has two
 problems.

 First, there's no function to do it.  Code paths (like sparse checkout)
 that wanted to try it would iterate over index entries and for each
 index entry pass that path to excluded_from_list().  But that is not
 how excluded_from_list() works; one is supposed to feed in each
 ancester of a path before a given path to find out if it was excluded
 because of some parent or grandparent matching a

	bigsubdirectory/

 pattern despite the path not matching any .gitignore pattern directly.

 Second, it's inefficient.  The excludes mechanism is supposed to let
 us block off vast swaths of the filesystem as uninteresting; separately
 checking every index entry doesn't fit that model.

 Introduce a new function to take care of both these problems.  This
 traverses the index in depth-first order (well, that's what order the
 index is in) to mark un-excluded entries.

 Maybe some day the in-core index format will be restructured to make
 this sort of operation easier.  Or maybe we will want to try some
 binary search based thing.  The interface is simple enough to allow
 all those things.  Example:

	clear_ce_flags(the_index.cache, the_index.cache_nr,
			CE_CANDIDATE, CE_CLEARME, exclude_list);

 would clear the CE_CLEARME flag on all index entries with
 CE_CANDIDATE flag and not matched by exclude_list.

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -834,6 +834,94 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>  	return mask;
>  }
>  
> +/*
> + * traverse the index, find every entry that matches according to
> + * o->el. Do "ce_flags &= ~clear_mask" on those entries. Return the
> + * number of traversed entries.
> + *
> + * If select_mask is non-zero, only entries whose ce_flags has on of
> + * those bits enabled are traversed.
> + */
> +static int clear_ce_flags_1(struct cache_entry **cache, int nr,
> +			    int prefix_len, int match_all,
> +			    int select_mask, int clear_mask,
> +			    struct unpack_trees_options *o)

o is only used for its excludes list.  Why not cut out the middle man
and take o->el as argument?

State (struct-worthy?): 
	cache     	pointer to an index entry
	prefix_len	an offset into its path
	match_all 	a boolean telling whether we have some skipping to do

The current path (called the prefix), including trailing '/', is

	cache[0]->name[0] .. cache[0]->name[prefix_len - 1]

> +{
> +	const char *name, *slash;
> +	struct cache_entry *ce = cache[0];
> +	const char *prefix = ce->name;
> +	int processed, original_nr = nr;
> +
> +	if (prefix_len && !match_all) {

Top level gets special handling.  Let's skip to it (see [*] for where we
were).

> +	while (nr) {

For each cache entry...

Might be nice to phrase this as a for loop.

	const char *prefix = cache[0]->name;
	struct cache_entry ** const cache_end = cache + nr;
	struct cache_entry **cep;
	for (cep = cache; cep != cache_end; ) {
		struct cache_entry *ce = *cep;

> +		if (select_mask && !(ce->ce_flags & select_mask)) {
> +			cache++;
> +			ce = *cache;
> +			nr--;
> +			continue;
> +		}

Some index entries are disqualified.

		if (select_mask)
			if (!(ce->ce_cflags & select_mask)) {
				cep++;
				continue;
			}

> +		if (prefix_len && strncmp(ce->name, prefix, prefix_len))
> +			break;

Entries not under prefix are handled by the caller.

		if (prefix_len)
			if (strncmp(ce->name, prefix, prefix_len))
				return cep - cache;	/* number processed */

> +		name = ce->name + prefix_len;
> +		slash = strchr(name, '/');
> +
> +		/* no slash, this is a file */
> +		if (!slash) {

Leaf!

			/* File or directory?  Check if it's a submodule. */
> +			int dtype = ce_to_dtype(ce);
> +			if (match_all ||
> +			    excluded_from_list(ce->name, ce_namelen(ce), name, &dtype, o->el) > 0)
> +				ce->ce_flags &= ~clear_mask;
> +			cache++;
> +			ce = *cache;
> +			nr--;
> +			continue;
> +		}

			if (excluded_block ||
			    excluded_from_list(...		/* Excluded? */
				ce->ce_flags &= ~clear_mask;
			cep++;
			continue;
		}

Putting the loop body in its own function could allow some nice depth
reduction.

> +
> +		/* has slash, this is a directory */
> +		processed = clear_ce_flags_1(cache, nr,
> +					     slash + 1 - ce->name, match_all,
> +					     select_mask, clear_mask, o);
> +		cache += processed;
> +		ce = *cache;
> +		nr -= processed;
> +	}
> +	return original_nr - nr;

		/* Non-leaf.  Recurse. */
		cep += clear_ce_ ...
	}

How about the other case, mentioned before (see [*] above)?

	assert(prefix_len);	/* Not toplevel but a subdirectory. */
	assert(!match_all);	/* Not wholesale excluded. */

> +		int dtype = DT_DIR;
> +		static char path[PATH_MAX];
> +		int pathlen = prefix_len - 1;
> +		const char *basename;
> +
> +		memcpy(path, prefix, pathlen);
> +		path[pathlen] = '\0';
> +		basename = strrchr(path, '/');

Use memrchr?

> +		basename = basename ? basename+1 : path;
> +		switch (excluded_from_list(path, pathlen, basename, &dtype, o->el)) {
> +		case 0:
> +			while (nr && !strncmp(ce->name, prefix, prefix_len)) {
> +				cache++;
> +				ce = *cache;
> +				nr--;
> +			}
> +			return original_nr - nr;

		case 0:	/* Included.  Great, no flag to clear. */
			for (; cep != cache_end; cep++)
				if (strncmp((*cep)->name, prefix, prefix_len))
					break;
			return cep - cache;

> +		case 1:
> +			match_all = 1;
> +			break;

		case 1:	/* Excluded.  Great, clear them wholesale. */
			match_all = 1;
			continue;

> +		default:	/* case -1, undecided */
> +			break;
> +		}

		case -1:	/* Undecided.
			continue;
		}
> +	}
> +	return original_nr - nr;
> +}

	}
	/* Ran off the end. */
	return cep - cache;
 }

That last bit could even longjmp() if expecting deeply nested
directory hierarchies.  (Sorry, joking, you should not do such an
awful thing unless timing it proves it useful.)

That was clean, thanks.

Hope some of my musings are useful anyway.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]