Re: [PATCH 2/2] lock_ref_sha1_basic: handle REF_NODEREF with invalid refs

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 12 Jan 2016 05:55:25 +0100

On 01/11/2016 04:52 PM, Jeff King wrote:
> We sometimes call lock_ref_sha1_basic both with REF_NODEREF
> to operate directly on a symbolic ref.

^^^ This sentence seems to be missing some words.

>                                        This is used, for
> example, to move to a detached HEAD, or when updating
> the contents of HEAD via checkout or symbolic-ref.
> 
> However, the first step of the function is to resolve the
> refname to get the "old" sha1, and we do so without telling
> resolve_ref_unsafe() that we are only interested in the
> symref. As a result, we may detect a problem there not with
> the symref itself, but with something it points to.
> 
> The real-world example I found (and what is used in the test
> suite) is a HEAD pointing to a ref that cannot exist,
> because it would cause a directory/file conflict with other
> existing refs.  This situation is somewhat broken, of
> course, as trying to _commit_ on that HEAD would fail. But
> it's not explicitly forbidden, and we should be able to move
> away from it. However, neither "git checkout" nor "git
> symbolic-ref" can do so. We try to take the lock on HEAD,
> which is pointing to a non-existent ref. We bail from
> resolve_ref_unsafe() with errno set to EISDIR, and the lock
> code thinks we are attempting to create a d/f conflict.
> 
> Of course we're not. The problem is that the lock code has
> no idea what level we were at when we got EISDIR, so trying
> to diagnose or remove empty directories for HEAD is not
> useful.
> 
> The most obvious solution would be to call
> resolve_ref_unsafe() with RESOLVE_REF_NO_RECURSE, so that we
> never look beyond the symref (and any problems we find must
> be attributable to it). However, that means we would not
> correctly gather the "old" sha1. We do not typically care
> about it for locking purposes with a symref (since the
> symref has no value on its own), but it does affect what we
> write into the HEAD reflog.
> 
> Another possibility is to avoid the d/f check when
> REF_NORECURSE is set. But that would mean we fail to notice
> a real d/f conflict. This is impossible with HEAD, but we
> would not want to create refs/heads/origin/HEAD.lock if we
> already have refs/heads/origin/HEAD/foo.
> 
> So instead, we attempt to resolve HEAD fully to get the old
> sha1, and only if that fails do we fallback to a
> non-recursive resolution. We lose nothing to the fallback,
> since we know the ref cannot be resolved, and thus we have
> no old sha1 in the first place. And we still get the benefit
> of the d/f-checking for the symref itself.
> 
> This does mean an extra round of filesystem lookups in some
> cases, but they should be rare. It only kicks in with
> REF_NODEREF, and then only when the existing ref cannot be
> resolved.
> 
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> This is prepared on top of the jk/symbolic-ref topic. As shown by the
> tests, though, this can be triggered when checking out a detached HEAD,
> so it existed prior to that. The fix is independent, but some of the
> additions to the test suite rely on that topic.
> 
> I can split it into separate patches, but I'm not sure it's worth the
> trouble. The only case I found that triggers it is quite obscure, and
> prior to the jk/symbolic-ref topic, you can recover using a non-detached
> checkout (or symbolic-ref).
> 
>  refs/files-backend.c             |  4 ++++
>  t/t1401-symbolic-ref.sh          |  7 +++++++
>  t/t2011-checkout-invalid-head.sh | 20 ++++++++++++++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 180c837..ea67d82 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1901,6 +1901,10 @@ static struct ref_lock *lock_ref_sha1_basic(const char *refname,
>  
>  	refname = resolve_ref_unsafe(refname, resolve_flags,
>  				     lock->old_oid.hash, &type);
> +	if (!refname && (flags & REF_NODEREF))
> +		refname = resolve_ref_unsafe(orig_refname,
> +					     resolve_flags | RESOLVE_REF_NO_RECURSE,
> +					     lock->old_oid.hash, &type);
> [...]

The main risk for this change would be that this new recovery code
allows the function to continue, but one of the outputs of the second
function invocation is not correct for the code that follows. Let me
think out loud:

* refname -- now will be equal to orig_refname. I think the main effect
is that it will be passed to verify_refname_available_dir(). This seems
to be what we want.

* type -- now reflects orig_refname; i.e., usually REF_ISSYMREF. This
also seems correct.

* lock->old_oid.hash -- is now ZEROS. This might get compared to the
caller's old_sha1 in verify_lock(), and it will also be written to the
reflog as the "old" value. I think this is also what we want.

So this change looks good to me.

Thanks for the good catch and especially the awesome commit message!

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html