Re: [PATCH 01/11] t: add helpers to test for reference existence

Patrick Steinhardt <ps@xxxxxx> · Mon, 23 Oct 2023 15:58:46 +0200

On Wed, Oct 18, 2023 at 09:06:50AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > There are two major ways to check for the existence of a reference in
> > our tests:
> >
> >     - `git rev-parse --verify` can be used to check for existence of a
> >       reference. This only works in the case where the reference is well
> >       formed though and resolves to an actual object ID. This does not
> >       work with malformed reference names or invalid contents.
> >
> >     - `test_path_is_file` can be used to check for existence of a loose
> >       reference if it is known to not resolve to an actual object ID. It
> >       by necessity reaches into implementation details of the reference
> >       backend though.
> 
> True.  It would be ideal if we can limit the use of latter when we
> _care_ how the ref is stored (e.g., "we expect it to be stored as a
> loose ref, not packed").  "The ref R at must be pointing at the
> commit X" is better asserted by using the former (or "git show-ref")
> as we not just only want to see the .git/refs/R file holding the
> object name X, but also want to see the production git tools observe
> the same---if what rev-parse or show-ref observes is different from
> the expected state and they say ref R does not point at commit X, we
> should complain (rev-parse or show-ref may be broken in the version
> of Git being tested, but we can assume that their breakage will be
> caught elsewhere in the test suite as well, so as long as we trust
> them, using them as the validator is better than going into the
> implementation detail and assuming things like "new refs always
> appear as a loose ref" that we might want to change in the future).
> 
> > Similarly, there are two equivalent ways to check for the absence of a
> > reference:
> >
> >     - `test_must_fail git rev-parse` can be used to check for the
> >       absence of a reference. It could fail due to a number of reasons
> >       though, and all of these reasons will be thrown into the same bag
> >       as an absent reference.
> >
> >     - `test_path_is_missing` can be used to check explicitly for the
> >       absence of a loose reference, but again reaches into internal
> >       implementation details of the reference backend.
> >
> > So both our tooling to check for the presence and for the absence of
> > references in tests is lacking as either failure cases are thrown into
> > the same bag or we need to reach into internal implementation details of
> > the respective reference backend.
> 
> > Introduce a new subcommand for our ref-store test helper that explicitly
> > checks only for the presence or absence of a reference. This addresses
> > these limitations:
> >
> >     - We can check for the presence of references with malformed names.
> 
> But for the purpose of tests, we can control the input.  When we
> perform an operation that we expect a ref R to be created, we would
> know R is well formed and we can validate using a tool that we know
> would be broken when fed a malformed name.  So I do not see this as
> a huge "limitation".

This is explicitly about the case where such a ref R is not well-formed
though. This limitation was mostly a problem in t1430-bad-ref-name.sh,
which verifies many such scenarios.

> >     - We can check for the presence of references that don't resolve.
> 
> Do you mean a dangling symbolic ref?  We are using a wrong tool if
> you are using rev-parse for that, aren't we?  Isn't symbolic-ref
> there for us for this exact use case?  That is

Again, t1430-bad-ref-name.sh has been the inspiration for this:

```
$ git symbolic-ref refs/heads/bad...name refs/heads/master
$ git symbolic-ref refs/heads/bad...name
fatal: No such ref: refs/heads/bad...name
```

The mismatch that you can write but not read the reference is kind of
astonishing though. We could fix this limitation, but I think there were
more usecases than only bad reference names. I honestly can't quite
remember right now.

> >     - We can explicitly handle the case where a reference is missing by
> >       special-casing ENOENT errors.
> 
> You probably know the error conditions refs_read_raw_ref() can be
> broken better than I do, but this feels a bit too intimate with how
> the method for the files backend happens to be implemented, which at
> the same time, can risk that [a] other backends can implement their
> "ref does not resolve to an object name---is it because it is
> missing?" report incorrectly and [b] we would eventually want to
> know error conditions other than "the ref requested is missing" and
> at that point we would need more "special casing", which does not
> smell easy to scale.

We actually rely on some of these error codes to be consistent across
backends. E.g. "refs.c" itself has higher-level logic that verifies
specific error codes when resolving symrefs. And as we explicitly made
these error codes part of the API design with `refs_read_raw_ref()` my
assumption is that any other backend needs to match the behaviour here.

I also think that this is a somewhat sane assumption to make. While it
may not be a good idea to tie this to standard error codes, the backend
should indeed be able to signal specific error cases to the caller. We
could refactor this to be more explicit about the expected failure cases
in the form of specialized error codes. But I'm not sure whether that
would be worth it for now, but it's sure something to keep in mind for
future patch series.

> >     - We don't need to reach into implementation details of the backend,
> >       which would allow us to use this helper for the future reftable
> >       backend.
> 
> This is exactly what we want to aim for.
> 
> > Next to this subcommand we also provide two wrappers `test_ref_exists`
> > and `test_ref_missing` that make the helper easier to use.
> 
> Hmmmm.  This may introduce "who watches the watchers" problem, no?
> I briefly wondered if a better approach is to teach the production
> code, e.g., rev-parse, to optionally give more detailed diag.  It
> essentially may be the same (making the code in test-ref-store.c
> added by this patch available from rev-parse, we would easily get
> there), so I do not think the distinction matters.
> 
> > diff --git a/t/README b/t/README
> > index 61080859899..779f7e7dd86 100644
> > --- a/t/README
> > +++ b/t/README
> > @@ -928,6 +928,15 @@ see test-lib-functions.sh for the full list and their options.
> >     committer times to defined state.  Subsequent calls will
> >     advance the times by a fixed amount.
> >  
> > + - test_ref_exists <ref>, test_ref_missing <ref>
> > +
> > +   Check whether a reference exists or is missing. In contrast to
> > +   git-rev-parse(1), these helpers also work with invalid reference
> > +   names and references whose contents are unresolvable. The latter
> > +   function also distinguishes generic errors from the case where a
> > +   reference explicitly doesn't exist and is thus safer to use than
> > +   `test_must_fail git rev-parse`.
> > +
> >   - test_commit <message> [<filename> [<contents>]]
> >  
> >     Creates a commit with the given message, committing the given
> > diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
> > index 48552e6a9e0..7400f560ab6 100644
> > --- a/t/helper/test-ref-store.c
> > +++ b/t/helper/test-ref-store.c
> > @@ -1,6 +1,6 @@
> >  #include "test-tool.h"
> >  #include "hex.h"
> > -#include "refs.h"
> > +#include "refs/refs-internal.h"
> >  #include "setup.h"
> >  #include "worktree.h"
> >  #include "object-store-ll.h"
> > @@ -221,6 +221,30 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
> >  	return ret;
> >  }
> >  
> > +static int cmd_ref_exists(struct ref_store *refs, const char **argv)
> > +{
> > +	const char *refname = notnull(*argv++, "refname");
> > +	struct strbuf unused_referent = STRBUF_INIT;
> > +	struct object_id unused_oid;
> > +	unsigned int unused_type;
> > +	int failure_errno;
> > +
> > +	if (refs_read_raw_ref(refs, refname, &unused_oid, &unused_referent,
> > +			      &unused_type, &failure_errno)) {
> > +		/*
> > +		 * We handle ENOENT separately here such that it is possible to
> > +		 * distinguish actually-missing references from any kind of
> > +		 * generic error.
> > +		 */
> > +		if (failure_errno == ENOENT)
> > +			return 17;
> 
> Can we tell between the cases where the ref itself is missing, and
> the requested ref is symbolic and points at a missing ref?  This
> particular case might be OK, but there may other cases where this
> "special case" may not be narrow enough.

Yes, because `refs_read_raw_ref()` doesn't concern itself with recursive
resolving of the reference, which is done at a higher level. It only
reads and parses the reference without caring whether their target
actually exists.

> As long we are going to spend cycles to refine the classification of
> error conditions, which is a very good thing to aim for the reason
> described in the proposed log message, namely "rev-parse can fail
> for reasons other than the ref being absent", I have to wonder again
> that the fruit of such an effort should become available in the
> production code, instead of being kept only in test-tool.

Fair enough, I'm happy to lift this up into production code. I just
didn't think this would be all that useful in general, but I can see
that somebody might want to use such functionality as part of our
plumbing interfaces.

I wonder what the best spot would be for it. Should we add a new
`--exists` switch to git-rev-parse(1)?

Patrick
Attachment:
signature.asc

Description: PGP signature