Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shuqi Liang wrote:
> Before this patch, git check-attr was unable to read the attributes from
> a .gitattributes file within a sparse directory. The original comment
> was operating under the assumption that users are only interested in
> files or directories inside the cones. Therefore, in the original code,
> in the case of a cone-mode sparse-checkout, we didn't load the
> .gitattributes file.
> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
> 
> Modify previous tests so such difference is not considered as an error.

I don't quite follow what you mean here "such difference". I see that you
changed the 'test_expect_failure' to 'test_expect_success', but that's
because the attributes inside a sparse directory are now being read rather
than ignored.

The rest of the commit message looks good to me, though.

> 
> Helped-by: Victoria Dye <vdye@xxxxxxxxxx>
> Signed-off-by: Shuqi Liang <cheskaqiqi@xxxxxxxxx>
> ---
>  attr.c                                   | 60 +++++++++++++++++-------
>  t/t1092-sparse-checkout-compatibility.sh |  4 +-
>  2 files changed, 44 insertions(+), 20 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..7650f5481a 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,59 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos = -1;
> +	char normalize_path[PATH_MAX];
> +	const char *relative_path;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * When handling sparse-checkouts, .gitattributes files
> +	 * may reside within a sparse directory. We distinguish
> +	 * whether a path exists directly in the index or not by
> +	 * evaluating if 'pos' is negative.
> +	 * If 'pos' is negative, the path is not directly present
> +	 * in the index and is likely within a sparse directory.
> +	 * For paths not in the index, The absolute value of 'pos'
> +	 * minus 1 gives us the position where the path would be
> +	 * inserted in lexicographic order within the index.
> +	 * We then subtract another 1 from this value
> +	 * (pos = -pos - 2) to find the position of the last
> +	 * index entry which is lexicographically smaller than
> +	 * the path. This would be the sparse directory containing
> +	 * the path. By identifying the sparse directory containing
> +	 * the path, we can correctly read the attributes specified
> +	 * in the .gitattributes file from the tree object of the
> +	 * sparse directory.
>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
> +		pos = index_name_pos_sparse(istate, path, strlen(path));
>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> +		if (pos < 0)
> +			pos = -pos - 2;
>  	}

What 'pos' represents after this block is somewhat confusing/possibly
misleading. Consider the possible cases for 'path':

1. 'path' is inside the sparse-checkout cone.
2. 'path' is not inside the sparse-checkout cone, but it is in the index
   (i.e., the sparse directory has been expanded for some reason).
3. 'path' is inside a sparse directory.

In case #1, 'pos' will be '-1'. We never enter the
'!path_in_cone_mode_sparse_checkout()' if-statement, so the value is never
updated.

In case #2, 'pos' will be positive, since it exists in the index.

In case #3, the return value of 'index_name_pos_sparse()' will be negative,
then assigned the value of '-pos - 2' to (in many cases) create a positive
value pointing to the entry before the insertion point of 'path'.

Based on your explanation in the code comment above, I would assume that, if
'pos' was >= 0 coming out of this block, it represented the index position
of the potential sparse directory containing 'path'. But that's not true in
case #2.

To make the purpose of these variables clearer, instead of changing 'pos'
in-place, you could create a variable like 'sparse_dir_pos'. Like 'pos' is
now, it'd be initialized to '-1', but instead of unconditionally assigning
it the output of 'index_name_pos_sparse()', you'd only assign it if the
output of that function is negative:

	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
		int pos = index_name_pos_sparse(istate, path, strlen(path));
		if (pos < 0)
			sparse_dir_pos = -pos - 2;
	}

Then, you'd use 'sparse_dir_pos' in the condition below, and it'd always be
-1 when 'path' is definitely not contained in a sparse directory.

>  
> -	return read_attr_from_buf(buf, path, flags);
> +	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&

nit: the check of '!path_in_cone_mode_sparse_checkout()' is redundant, since
'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()' is true.

> +	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
> +	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
> +	    !normalize_path_copy(normalize_path, path)) {

The normalization should come before the 'strncmp' so that you're comparing
'normalize_path' to the index entry.  

That said, I know I asked about path normalization earlier [1], but did you
confirm that 'path' _isn't_ normalized? Because if it's normalized by all of
the potential callers of this function, the check here would be unnecessary.

[1] https://lore.kernel.org/git/e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@xxxxxxxxxx/

> +		relative_path = normalize_path + ce_namelen(istate->cache[pos]);

'relative_path' should be declared here, since it's not used outside this
block. 

> +		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 90633f383a..3f32c1f972 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
>  	test_all_match git check-attr -a --cached -- deep/a
>  '
>  
> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
> +test_expect_success 'check-attr with pathspec outside sparse definition' '

Re: my suggested change to the test in patch 1 [2], when I applied _this_
patch, the test still failed for the 'sparse-index' case. It doesn't seem to
be a problem with your patch, but rather a bug in 'diff' that can be
reproduced with this test (using the infrastructure in t1092):

test_expect_failure 'diff --cached shows differences in sparse directory' '
	init_repos &&

	test_all_match git reset --soft update-folder1 &&
	test_all_match git diff --cached -- folder1/a
'

It's not immediately obvious to me what the problem is, but my guess is it's
some mis-handling of sparse directories in the internal diff machinery.
Given the likely complexity of the issue, I'd be content with you leaving
the 'diff --check' test as 'test_expect_failure' with a note about the bug
in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
wouldn't be opposed to that either. :) 

[2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@xxxxxxxxxx/

>  	init_repos &&
>  
>  	echo "a -crlf myAttr" >>.gitattributes &&
> @@ -2288,7 +2288,7 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
>  	test_all_match git check-attr  -a --cached -- folder1/a
>  '
>  
> -test_expect_failure 'diff --check with pathspec outside sparse definition' '
> +test_expect_success 'diff --check with pathspec outside sparse definition' '
>  	init_repos &&
>  
>  	write_script edit-contents <<-\EOF &&




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux