Re: [PATCH v2 5/5] Reftable support for git-core

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Tue, 28 Jan 2020 16:56:26 +0100

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
>
> > [...]
>
> I'll try to add my 2 cents to all of the XXX spots you flagged.
>

thanks! that was really helpful.

>
> Some thoughts on compatibility:
>
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
>
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
>
>   - we'd find one, which would be confusing and possibly wrong
>
>   - we wouldn't, in which case the error would be something like "no git
>     repository", and not "your git repository is too new"
>
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".

You have good points.

JGit currently implements what we have here, as this is what's spelled
out in the spec that Shawn posted  back in the day. It's probably
acceptable to this, though, as the reftable support has only landed in
JGit very recently and will probably appear very experimental to
folks.

How would the layout be then? We'll have

  HEAD - dummy file
  reftable/ - the tables
  refs/ - dummy dir

where shall we store the reftable list? maybe in a file called

  reftable-list

If we have both HEAD/refs + (refable/reftable-list), what should we
put there to ensure that no git version actually manages to use the
repository? (what happens if someone deletes the version setting from
the .git/config file)

> > +/* XXX which ordering are these? Newest or oldest first? */
> >  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> >  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>
> The top one is chronological order (i.e., reading the file from start to
> finish), and the latter is reverse-chronological (seeking backwards in
> the file).

thanks, I put in a commit to clarify this.

> > +     err = merged_table_seek_ref(mt, &ri->iter, prefix);
> > +     /* XXX what to do with err? */
>
> Hmm, I don't think you can return an error from iterator_begin(). It
> would probably be OK to record the error in your local struct, and then
> later return ITER_ERROR from iterator_advance().

good point. I did that.

> > +     /* XXX - are we allowed to mutate the input data? */
> > +     qsort(transaction->updates, transaction->nr,
> > +           sizeof(struct ref_update *), ref_update_cmp);
>
> I don't offhand know of anything that would break, but it's probably a
> bad idea to do so. If you need a sorted view, can you make an auxiliary
> array-of-pointers? Also, the QSORT() macro is a little shorter and has
> some extra checks.

Done.

> In the "unsafe" version, the memory belongs to a static buffer inside
> the function. You shouldn't need to free it.

OK.

> > +static int
> > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > [...]
> > +             /* XXX const? */
> > +             memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
>
> You'd want probably want to use hashcpy() here (or better yet, use
> "struct object_id" in ri->log, too, and then use oidcpy()).

It's more practical in the library to use pointers rather than fixed
size arrays, so it'll be hashcpy.

> But that raises a question: how ready are reftables to handle non-sha1
> object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> format actually has binary sha1s, right? In theory if those all become
> the_hash_algo->rawsz, then it might "Just Work" to read and write
> slightly larger entries.

The format fixes the reftable at 20 bytes, and there is not enough
framing information to just write more data. We'll have to encode the
hash size in the version number somehow, eg. we could use the  higher
order bit of the version byte to encode it, for example.

But it needs a new version of the spec. I think it's premature to do
this while v1 of reftable isn't in git-core yet.

> The committer entry we pass back to each_reflog_ent_fn should be the
> full "Human Name <email@host>".

thx, done.

> I agree that we'd generally want to expire and compact all entries at
> once. Unfortunately I think this per-ref interface is exposed by
> git-reflog. I.e., you can do "git reflog expire refs/heads/foo".
>
> Could we add an extra API call for "expire everything", and then:

I took note of your remarks, and added a BUG() for now.

I'll focus on getting the CI on gitgitgadget to build this code first.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado