Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shuqi Liang wrote:
> 'git check-attr' cannot currently find attributes of a file within a
> sparse directory. This is due to .gitattributes files are irrelevant in
> sparse-checkout cone mode, as the file is considered sparse only if all
> paths within its parent directory are also sparse. 

If .gitattributes files are irrelevant in sparse-checkout cone mode, then
why are we changing the behavior? If you're challenging that assertion,
please state so clearly.

> In addition,> searching for a .gitattributes file causes expansion of the sparse
> index, which is avoided to prevent potential performance degradation.

This isn't an unchangeable fact (as your implementation below shows).
Expanding the index is just the most straightforward approach, but the
performance cost of that is (AFAICT) a reason used to justify why we didn't
read sparse directory attributes in the past.

> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().

Makes sense to me.

> 
> Helped-by: Victoria Dye <vdye@xxxxxxxxxx>
> Signed-off-by: Shuqi Liang <cheskaqiqi@xxxxxxxxx>
> ---
>  attr.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..be06747b0d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * If the pos value is negative, it means the path is not in the index. 
> +	 * However, the absolute value of pos minus 1 gives us the position where the path 
> +	 * would be inserted in lexicographic order. By subtracting another 1 from this 
> +	 * value (pos = -pos - 2), we find the position of the last index entry 
> +	 * which is lexicographically smaller than the provided path. This would be 
> +	 * the sparse directory containing the path.

This is a good explanation of what '-pos - 2' represents, but it doesn't
explain why we'd want that value. Could you add a bit of detail around why
1) we care whether 'pos' identifies a value that exists in the index or not,
and 2) why we're looking for the sparse directory containing the path?

>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = - pos - 2;

nit: don't add the space between '-' and 'pos'. This should be:

	pos = -pos - 2;

>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> -	}
> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {

Typically, we try to put the less expensive operation first in a condition
like this (if the first part of the condition is 'false', the second part
won't be evaluated). 'path_in_cone_mode_sparse_checkout()' is more expensive
than a simple numerical check, so this should probably be:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate)) {

But on a more general note, why check 'path_in_cone_mode_sparse_checkout()'
at all? The goal is to determine whether 'path' is inside a sparse
directory, so first you search the index to find where that directory would
be, then - if 'path' isn't in the sparse-checkout cone - check whether the
index entry you found is a sparse directory. But sparse directories can't
exist within the sparse-checkout cone in the first place, so the
'path_in_cone_mode_sparse_checkout()' is redundant. 

Instead, 'path_in_cone_mode_sparse_checkout()' (and probably
'istate->sparse_index', since sparse directories can't exist if the index
isn't sparse) could be used to avoid calculating 'index_name_pos_sparse()'
in the first place; the index search operation is generally more expensive
than 'path_in_cone_mode_sparse_checkout()', especially when sparse-checkout
is disabled entirely.

> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> +			return NULL;
>  
> -	return read_attr_from_buf(buf, path, flags);
> +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {

All of these nested conditions could be simplified/collapsed into a single,
top-level condition:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos]))) {

IMO, this also more clearly reflects _why_ you'd want to enter this
condition and read from the index directly:

* If the path is not in the sparse-checkout cone
* AND the index entry preceding 'path' is a sparse directory
* AND the sparse directory is the prefix of 'path' (i.e., 'path' is in the
  directory) 
    -> Read from the sparse directory's tree

One other quick sanity check - for the sparse directory prefixing check to
work, 'path' needs to be a normalized path relative to the root of the repo.
Is that guaranteed to be the case here?

> +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  

Here, you get the relative path within the sparse directory by skipping past
the sparse directory name in 'path'. If 'path' is normalized (see above),
this works. Nice!

> +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +		}
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux