Re: [PATCH v7 04/33] files-backend: break out ref reading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/01/2016 01:52 AM, David Turner wrote:
> Refactor resolve_ref_1 in terms of a new function read_raw_ref, which
> is responsible for reading ref data from the ref storage.
> 
> Later, we will make read_raw_ref a pluggable backend function, and make
> resolve_ref_unsafe common.
> 
> Testing done: Hacked in code to run both old and new version of
> resolve_ref_1 and compare all outputs, failing dramatically if outputs
> differed.  Ran test suite.

I spent an inordinate amount of time trying to review this patch. It is
much too big and does too much at the same time. And, in fact, it makes
some mistakes, which were impossible to see until I picked the patch
apart into smaller steps.

The reference-reading code before this patch wasn't structured very
well. Part of the reason that the code is a mess is that it is pretty
intricate and has to get things just right to avoid race conditions. But
the rest of the reason is that it was overdue for a refactoring, and
this patch shows a great way forward.

Because of the intricacy of this code, it is really important to do a
careful job changing it. To me that means refactoring in the smallest
possible steps, ideally so that each step is obviously correct.

So to review your patch, I picked it apart into tiny preparatory
refactorings, followed by the main patch (the extraction of the function
read_raw_ref()), followed by some more cleanups. When I did so I found
that there were some differences between my end product and yours. Some
of these introduce minor bugs, so it is worth fixing them.

I've annotated your patch below, but in my opinion a better way forward
would be to commit not this single giant patch, but rather the
picked-apart version, which also addresses my comments below. I just
submitted that patch series [1]. It's twenty-one patches(!), though a
bit over half of them do things that go beyond this patch. In the
future, it would help the review process if you would submit smaller
patches that do a single thing at a time.

I've reviewed the patches that precede this one and they look fine to
me. I haven't reviewed this version of patches 05 through 33 yet.

Michael

[1] http://mid.gmane.org/cover.1458723959.git.mhagger@xxxxxxxxxxxx

> Signed-off-by: David Turner <dturner@xxxxxxxxxxxxxxxx>
> Helped-by: Duy Nguyen <pclouds@xxxxxxxxx>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  refs/files-backend.c | 265 ++++++++++++++++++++++++++++++---------------------
>  1 file changed, 159 insertions(+), 106 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 9676ec2..8c6a58e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1369,12 +1369,11 @@ static struct ref_entry *get_packed_ref(const char *refname)
>  
>  /*
>   * A loose ref file doesn't exist; check for a packed ref.  The
> - * options are forwarded from resolve_safe_unsafe().
> + * options are forwarded from resolve_ref_unsafe().
>   */
>  static int resolve_missing_loose_ref(const char *refname,
> -				     int resolve_flags,
>  				     unsigned char *sha1,
> -				     int *flags)
> +				     unsigned int *flags)

This patch changes some flags variables from "int" to "unsigned int"
without discussion. I think the change is OK, but why not do it in a
separate patch?

>  {
>  	struct ref_entry *entry;
>  
> @@ -1390,64 +1389,48 @@ static int resolve_missing_loose_ref(const char *refname,
>  		return 0;
>  	}
>  	/* The reference is not a packed reference, either. */
> -	if (resolve_flags & RESOLVE_REF_READING) {
> -		errno = ENOENT;
> -		return -1;
> -	} else {
> -		hashclr(sha1);
> -		return 0;
> -	}
> +	errno = ENOENT;
> +	return -1;
>  }
>  
> -/* This function needs to return a meaningful errno on failure */
> -static const char *resolve_ref_1(const char *refname,
> -				 int resolve_flags,
> -				 unsigned char *sha1,
> -				 int *flags,
> -				 struct strbuf *sb_refname,
> -				 struct strbuf *sb_path,
> -				 struct strbuf *sb_contents)
> +/*
> + * Read a raw ref from the filesystem or packed refs file.
> + *
> + * If the ref is a sha1, fill in sha1 and return 0.
> + *
> + * If the ref is symbolic, fill in *symref with the referrent
> + * (e.g. "refs/heads/master") and return 0.  The caller is responsible
> + * for validating the referrent.  Set REF_ISSYMREF in flags.
> + *
> + * If the ref is neither a symbolic ref nor a sha1, it is broken.  Set
> + * REF_ISBROKEN in flags, set errno to EINVAL, and return -1.
> + *
> + * If the ref doesn't exist, set errno to ENOENT and return -1.
> + *
> + * If there is another error reading the ref, set errno appropriately and
> + * return -1.
> + *
> + * Backend-specific flags might be set in flags as well, regardless of
> + * outcome.
> + *
> + * sb_path is workspace: the caller should allocate and free it.
> + */
> +static int read_raw_ref(const char *refname, unsigned char *sha1,
> +			struct strbuf *symref, struct strbuf *sb_path,
> +			unsigned int *flags)
>  {
> -	int depth = MAXDEPTH;
> -	int bad_name = 0;
> -
> -	if (flags)
> -		*flags = 0;
> +	struct strbuf sb_contents = STRBUF_INIT;

I see you've pushed responsibility for managing the scratch space
sb_contents from resolve_ref_unsafe() down to read_raw_ref(). This seems
like a good idea to decouple the code better (albeit at the expense of
some extra memory allocations). But why don't you push responsibility
for managing sb_path down at the same time? It seems that the same
considerations would apply.

BTW if you do so, then resolve_ref_unsafe() becomes trivial and
resolve_ref_1() could easily be inlined into it.

> +	int ret = -1;
> +	const char *path;
> +	const char *buf;
>  
> -	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
> -		if (flags)
> -			*flags |= REF_BAD_NAME;
> +	strbuf_reset(sb_path);
> +	strbuf_git_path(sb_path, "%s", refname);
> +	path = sb_path->buf;
>  
> -		if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
> -		    !refname_is_safe(refname)) {
> -			errno = EINVAL;
> -			return NULL;
> -		}
> -		/*
> -		 * dwim_ref() uses REF_ISBROKEN to distinguish between
> -		 * missing refs and refs that were present but invalid,
> -		 * to complain about the latter to stderr.
> -		 *
> -		 * We don't know whether the ref exists, so don't set
> -		 * REF_ISBROKEN yet.
> -		 */
> -		bad_name = 1;
> -	}
>  	for (;;) {
> -		const char *path;
>  		struct stat st;
> -		char *buf;
>  		int fd;
> -
> -		if (--depth < 0) {
> -			errno = ELOOP;
> -			return NULL;
> -		}
> -
> -		strbuf_reset(sb_path);
> -		strbuf_git_path(sb_path, "%s", refname);
> -		path = sb_path->buf;
> -
>  		/*
>  		 * We might have to loop back here to avoid a race
>  		 * condition: first we lstat() the file, then we try
> @@ -1457,49 +1440,45 @@ static const char *resolve_ref_1(const char *refname,
>  		 * we don't want to report that as an error but rather
>  		 * try again starting with the lstat().
>  		 */
> -	stat_ref:
> +
>  		if (lstat(path, &st) < 0) {
>  			if (errno != ENOENT)
> -				return NULL;
> -			if (resolve_missing_loose_ref(refname, resolve_flags,
> -						      sha1, flags))
> -				return NULL;
> -			if (bad_name) {
> -				hashclr(sha1);
> -				if (flags)
> -					*flags |= REF_ISBROKEN;
> -			}
> -			return refname;
> +				break;
> +			if (resolve_missing_loose_ref(refname, sha1, flags))
> +				break;
> +			ret = 0;
> +			break;
>  		}
>  
>  		/* Follow "normalized" - ie "refs/.." symlinks by hand */
>  		if (S_ISLNK(st.st_mode)) {
> -			strbuf_reset(sb_contents);
> -			if (strbuf_readlink(sb_contents, path, 0) < 0) {
> +			strbuf_reset(&sb_contents);
> +			if (strbuf_readlink(&sb_contents, path, 0) < 0) {
>  				if (errno == ENOENT || errno == EINVAL)
>  					/* inconsistent with lstat; retry */
> -					goto stat_ref;
> +					continue;
>  				else
> -					return NULL;
> +					break;
>  			}
> -			if (starts_with(sb_contents->buf, "refs/") &&
> -			    !check_refname_format(sb_contents->buf, 0)) {
> -				strbuf_swap(sb_refname, sb_contents);
> -				refname = sb_refname->buf;
> +			if (starts_with(sb_contents.buf, "refs/") &&
> +			    !check_refname_format(sb_contents.buf, 0)) {
> +				strbuf_swap(&sb_contents, symref);
>  				if (flags)
>  					*flags |= REF_ISSYMREF;
> -				if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
> -					hashclr(sha1);
> -					return refname;
> -				}
> -				continue;
> +				ret = 0;
> +				break;
> +			} else {
> +				/* bogus symlink ref  */
> +				if (flags)
> +					*flags |= REF_ISBROKEN;
> +				break;

I think this is a functional change.

Some background: We still support writing symrefs as symlinks if
core.prefersymlinkrefs is set (the code is in create_ref_symlink()).
Such symlinks contain the full ref name (e.g., "refs/heads/foo") and are
read via the

    if starts_with(sb_contents.buf, "refs/")

block above.

I suppose that this code was used in the old days when "HEAD" was the
only symbolic reference, in which case the symlink would really point at
the file representing the referred-to loose reference. Nowadays symbolic
references can live elsewhere, and references can be packed, so the
symlink representing the symbolic reference might not even point at a
file at the filesystem level. But that's not really a problem; the OS
doesn't care what bytes you stick in a symlink.

Anyway, at this point in the code we're not talking about that kind of
symlink. We're talking about a symlink that might point legitimately at
another file at the filesystem level, even though its contents might be
something like "../foo/bar" that doesn't even look like a reference name.

The old resolve_ref_1() did NOT consider such a symlink to be bogus *if*
it pointed to a file. Instead the code fell through to a questionable
S_ISDIR() test followed by an attempt to read the loose reference file
directly through the symlink. Thus, for example, the old code would
resolve a symlink "refs/heads/foo" with contents "bar" to
"refs/heads/bar", albeit without marking it REF_ISSYMREF.

We might not want to continue supporting this kind of
symlink-as-reference. It seems pretty silly and broken to me (e.g., if
the referred-to reference is packed, it would break). But maybe the
old-timers can think of a reason to preserve this behavior. Let's have
that discussion.

Either way, it is NOT OK to change behavior like this in a big patch
without even mentioning it in the commit message. Ideally, the behavior
change should be done in a separate patch with an explicit justification.

>  			}
>  		}
>  
>  		/* Is it a directory? */
>  		if (S_ISDIR(st.st_mode)) {
>  			errno = EISDIR;
> -			return NULL;
> +			break;
>  		}
>  
>  		/*
> @@ -1510,35 +1489,110 @@ static const char *resolve_ref_1(const char *refname,
>  		if (fd < 0) {
>  			if (errno == ENOENT)
>  				/* inconsistent with lstat; retry */
> -				goto stat_ref;
> +				continue;
>  			else
> -				return NULL;
> +				break;
>  		}
> -		strbuf_reset(sb_contents);
> -		if (strbuf_read(sb_contents, fd, 256) < 0) {
> +		strbuf_reset(&sb_contents);
> +		if (strbuf_read(&sb_contents, fd, 256) < 0) {
>  			int save_errno = errno;
>  			close(fd);
>  			errno = save_errno;
> -			return NULL;
> +			break;
>  		}
>  		close(fd);
> -		strbuf_rtrim(sb_contents);
> +		strbuf_rtrim(&sb_contents);
> +		buf = sb_contents.buf;
> +		if (starts_with(buf, "ref:")) {
> +			buf += 4;
> +			while (isspace(*buf))
> +				buf++;
> +
> +			strbuf_reset(symref);
> +			strbuf_addstr(symref, buf);
> +			if (flags)


read_raw_ref() is only called with a valid flags pointer, so you can get
rid of these "if (flags)" guards (here and elsewhere).

In fact, the code in resolve_ref_1() could be simplified the same way if
resolve_ref_unsafe() (the only public entry point) would contain

int unused_flags;

if (!flags)
flags = &unused_flags;

but feel free to consider that outside of the scope of your patch series.


> +				*flags |= REF_ISSYMREF;
> +			ret = 0;
> +			break;
> +		}
>  
>  		/*
> -		 * Is it a symbolic ref?
> +		 * Please note that FETCH_HEAD has additional
> +		 * data after the sha.
>  		 */
> -		if (!starts_with(sb_contents->buf, "ref:")) {
> -			/*
> -			 * Please note that FETCH_HEAD has a second
> -			 * line containing other data.
> -			 */
> -			if (get_sha1_hex(sb_contents->buf, sha1) ||
> -			    (sb_contents->buf[40] != '\0' && !isspace(sb_contents->buf[40]))) {
> +		if (get_sha1_hex(buf, sha1) ||
> +		    (buf[40] != '\0' && !isspace(buf[40]))) {
> +			if (flags)
> +				*flags |= REF_ISBROKEN;
> +			errno = EINVAL;
> +			break;
> +		}
> +		ret = 0;
> +		break;
> +	}
> +
> +	strbuf_release(&sb_contents);
> +	return ret;
> +}
> +
> +/* This function needs to return a meaningful errno on failure */
> +static const char *resolve_ref_1(const char *refname,
> +				 int resolve_flags,
> +				 unsigned char *sha1,
> +				 int *flags,
> +				 struct strbuf *sb_refname,
> +				 struct strbuf *sb_path)
> +{
> +	int bad_name = 0;
> +	int symref_count;
> +
> +	if (flags)
> +		*flags = 0;
> +
> +	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
> +		if (flags)
> +			*flags |= REF_BAD_NAME;
> +
> +		if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
> +		    !refname_is_safe(refname)) {
> +			errno = EINVAL;
> +			return NULL;
> +		}
> +		/*
> +		 * dwim_ref() uses REF_ISBROKEN to distinguish between
> +		 * missing refs and refs that were present but invalid,
> +		 * to complain about the latter to stderr.
> +		 *
> +		 * We don't know whether the ref exists, so don't set
> +		 * REF_ISBROKEN yet.
> +		 */
> +		bad_name = 1;
> +	}
> +
> +	for (symref_count = 0; symref_count < MAXDEPTH; symref_count++) {
> +		unsigned int read_flags = 0;
> +
> +		if (read_raw_ref(refname, sha1, sb_refname, sb_path, &read_flags)) {
> +			int saved_errno = errno;
> +			if (flags)
> +				*flags |= read_flags;
> +			errno = saved_errno;
> +			if (bad_name) {
> +				hashclr(sha1);
>  				if (flags)
>  					*flags |= REF_ISBROKEN;
> -				errno = EINVAL;
> +			}
> +			if (resolve_flags & RESOLVE_REF_READING || errno != ENOENT) {
>  				return NULL;
> +			} else {
> +				hashclr(sha1);
> +				return refname;
>  			}
> +		}
> +		if (flags)
> +			*flags |= read_flags;
> +
> +		if (!(read_flags & REF_ISSYMREF)) {
>  			if (bad_name) {
>  				hashclr(sha1);
>  				if (flags)
> @@ -1546,44 +1600,43 @@ static const char *resolve_ref_1(const char *refname,
>  			}
>  			return refname;
>  		}
> -		if (flags)
> -			*flags |= REF_ISSYMREF;
> -		buf = sb_contents->buf + 4;
> -		while (isspace(*buf))
> -			buf++;
> -		strbuf_reset(sb_refname);
> -		strbuf_addstr(sb_refname, buf);
> +
>  		refname = sb_refname->buf;
>  		if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
>  			hashclr(sha1);
> +			if (bad_name && flags)
> +				*flags |= REF_ISBROKEN;
>  			return refname;
>  		}
> -		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
> +
> +		if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
>  			if (flags)
>  				*flags |= REF_ISBROKEN;
> -
>  			if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
> -			    !refname_is_safe(buf)) {
> +			    !refname_is_safe(refname)) {
>  				errno = EINVAL;
>  				return NULL;
>  			}
>  			bad_name = 1;
>  		}
>  	}
> +
> +	if (flags)
> +		*flags |= REF_ISBROKEN;
> +	return NULL;
>  }
>  
>  const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
>  			       unsigned char *sha1, int *flags)
>  {
>  	static struct strbuf sb_refname = STRBUF_INIT;
> -	struct strbuf sb_contents = STRBUF_INIT;
>  	struct strbuf sb_path = STRBUF_INIT;
>  	const char *ret;
>  
>  	ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
> -			    &sb_refname, &sb_path, &sb_contents);
> +			    &sb_refname, &sb_path);
> +
>  	strbuf_release(&sb_path);
> -	strbuf_release(&sb_contents);
>  	return ret;
>  }
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]