Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shuqi Liang <cheskaqiqi@xxxxxxxxx> writes:

> 'git check-attr' cannot currently find attributes of a file within a
> sparse directory. This is due to .gitattributes files are irrelevant in
> sparse-checkout cone mode, as the file is considered sparse only if all
> paths within its parent directory are also sparse.

I do not quite understand what these two sentences want to say.  If
the attribute files are truly irrelevant then "cannot find" does not
matter, because there is no point in finding irrelevant things that
by definition will not affect the outcome of any commands at all,
no?

> In addition,
> searching for a .gitattributes file causes expansion of the sparse
> index, which is avoided to prevent potential performance degradation.

Does this sentence want to say that there is a price to pay, in
order to read an attribute file that is not part of the cones of
interest, that you first need to expand the sparse index?  I think
that is a given and I am not sure what the point of saying it is.

> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.

I agree.  Not reading attribute files correctly will lead to a bug.

Let me rephase what (I think) you wrote below to see if I understand
what you are doing correctly.

Suppose that sub1/.gitattributes need to be read, when the calling
command wants to know about attributes of sub1/file.  Imagine that
sub1/ and sub2/ are both outside the cones of interest. It would be
better not to expand sub2/ even though we need to expand sub1/.  Not
calling ensure_full_index() upfront and instead expanding the
necessary subdirectories on demand would be a good way to solve it.

Is that what going on?

> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
>
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
>
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
>
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
>
> Helped-by: Victoria Dye <vdye@xxxxxxxxxx>
> Signed-off-by: Shuqi Liang <cheskaqiqi@xxxxxxxxx>
> ---
>  attr.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)

Thanks.

> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..be06747b0d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.

Imagine that you have a tree with sub1/ outside the cones of
interest and sub2/ and sub9/ inside the cones of interest, and
further imagine that sub1/.gitattributes and sub2/.gitattributes
give attribute X to sub1/file and sub2/file respectively.  There
is no sub9/.gitattributes file.

Then "git ls-files ':(attr:X)sub[0-9]'" _could_ have two equally
sensible behaviours:

 (1) Only show sub2/file because sub1/ is outside the cones of
     interest and the user does not want to clutter the output
     from the parts of the tree they are not interested in.

 (2) Show both sub1/file and sub2/file, even though sub1/ is outside
     the cones of interest, in response to the fact that the mention
     of "sub[0-9]" on the command line is an explicit indication of
     interest by the user (it would become more and more interesting
     if the pathspec gets less specific, like ":(attr:X)" that is
     treewide, though).

The original comment seems to say that only behaviour (1) is
supported, but I wonder if we eventually want to support both,
choice made by the calling code (and perhaps options)?  In any case,
offering the choice of (2) is a good thing in the longer run.
Anyway...

> +	 * If the pos value is negative, it means the path is not in the index. 
> +	 * However, the absolute value of pos minus 1 gives us the position where the path 
> +	 * would be inserted in lexicographic order. By subtracting another 1 from this 
> +	 * value (pos = -pos - 2), we find the position of the last index entry 
> +	 * which is lexicographically smaller than the provided path. This would be 
> +	 * the sparse directory containing the path.

That is true only if the directory containing the .gitattribute file
is sparsified (e.g. sub1/.gitattributes does not appear in the index
but sub1/ does; sub2/.gitattributes however does appear in the index
and there is no sub2/ in the index).

If not, there are two cases:

 * sub2/.gitattributes does appear in the index (and there is no
   sub2/ in the index).  "pos = - pos - 2" computes a nonsense
   number in this case; hopefully we can reject it early by noticing
   that the resulting pos is negative.

 * sub9/.gitattributes does not belong to the project.  The pos is
   negative and "- pos - 2" does not poihnt at sub9/ (as it is not
   sparse).  Depending on what other paths appear in sub9/., the
   path that appears at (-pos-2) may be inside or outside sub9/.  In
   the worst case, it could be a sparsified directory that sorts
   directly before sub9/ (say, there is sub8/ that is sparse, which
   may have .gitattributes in it).  Would the updated code
   mistakenly check S_ISSPARSEDIR() on sub8/ that has no relevance
   when we are dealing with sub9/.gitattributes that does not exist?

> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = - pos - 2;
>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> -	}
> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {
> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> +			return NULL;

So earlier, the code, given say sub1/.gitattributes, checked if that
path is outside the cones of interest and skipped reading it.  But
the updated code tries to check the same "is it outside or inside?"
condition for sub1/ directory itself.  Does it make a practical
difference that you can demonstrate with a test?

I do not know if the updated code does the right thing for
sub2/.gitattributes (exists in a non-sparse directory) and
sub9/.gitattributes (does not exist in non-sparse directory),
though.

> +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {

Don't compare with "==0", write !strncmp(...) instead.

> +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  
> +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +		}

If the earlier "- pos - 2" misidentified the parent sparse directory
entry in the index and the strncmp() noticed that mistake, we would
come here without reading any new attribute stack frame.  Don't we
need to fallback reading from the path in the correct directory that
is not at "- pos - 2"?

Let's imagine this case where sub/ is a directory outside the cones
of interest, and our sparse-index may or may not have it as a
directory in the index, and then the caller asks to read from the
"sub/sub1/.gitattributes" file.  Even when "sub/" is expanded in the
index, "sub/sub1/" may not and appear as a directory in the index.

The above "find relative_path and read from the tree object" code
would of course work when the direct parent directory of
".gitattributes" is visible in the index, but interestingly, it
would also work when it does not.  E.g. if "sub/" is represented as
a directory in the index, then asking for "sub1/.gitattributes"
inside the tree object of "sub/" would work as get_tree_entry() used
by read_attr_from_blob() would get to the right object recursively,
so that is nice.  If that is why "'- pos - 2' must be the directory
entry in the index that _would_ include $leadingpath/.gitattributes
regardless of how many levels of directory hierarchy there are
inside $leadingpath" idea was chosen, I'd have to say that it is
clever ;-)

I however find the "'- pos - 2' must be the directory entry in the
index" trick hard to reason about and explain.  I wonder if we write
this in a more straight-forward and stupid way, the result becomes
easier to read and less prone to future bugs...

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux