Re: [PATCH v2 5/5] Reftable support for git-core

Jeff King <peff@xxxxxxxx> · Wed, 29 Jan 2020 05:47:54 -0500

On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote:

> JGit currently implements what we have here, as this is what's spelled
> out in the spec that Shawn posted  back in the day. It's probably
> acceptable to this, though, as the reftable support has only landed in
> JGit very recently and will probably appear very experimental to
> folks.
> 
> How would the layout be then? We'll have
> 
>   HEAD - dummy file
>   reftable/ - the tables
>   refs/ - dummy dir
> 
> where shall we store the reftable list? maybe in a file called
> 
>   reftable-list
> 
> If we have both HEAD/refs + (refable/reftable-list), what should we
> put there to ensure that no git version actually manages to use the
> repository? (what happens if someone deletes the version setting from
> the .git/config file)

Yeah, it would be nice to have something that an older version of Git
would totally choke on, but I'm not sure we have a lot of leeway. What
we put in HEAD has to be syntactically legitimate enough to appease
validate_headref(), so our options are either "ref:
refs/something/bogus" or an object hash that we don't have (e.g.,
0{40}). The former would be preferable because it would (in theory)
prevent us from writing to HEAD, as well.

I wondered what would happen if you put in a syntactically invalid ref,
like "ref: refs/.not/.valid" (leading dots are not allowed in path
components of refnames). It does cause _some_ parts of Git to choke, but
sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid.

Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're
pretty willing to accept symref destinations without further checking.

Making "refs" a file instead of a directory does work nicely, as any
attempts to read or write would get ENOTDIR. And we can fool
is_git_directory() as long as it's marked executable. That's OK on POSIX
systems, but I'm not sure how it would work on Windows (or maybe it
would work just fine, since we presumably just say "yep, everything is
executable").

So perhaps that's enough, and what we put in HEAD won't matter (since
nobody will be able to write into refs/ anyway).

> > But that raises a question: how ready are reftables to handle non-sha1
> > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> > format actually has binary sha1s, right? In theory if those all become
> > the_hash_algo->rawsz, then it might "Just Work" to read and write
> > slightly larger entries.
> 
> The format fixes the reftable at 20 bytes, and there is not enough
> framing information to just write more data. We'll have to encode the
> hash size in the version number somehow, eg. we could use the  higher
> order bit of the version byte to encode it, for example.
> 
> But it needs a new version of the spec. I think it's premature to do
> this while v1 of reftable isn't in git-core yet.

I don't know that we technically need the reftables file to say how long
the hashes are. The git config will tell us which hash we're using, and
everything else is supposed to follow. So I think it would work OK as
long as you're able to be told by the rest of Git that hashes are N
bytes, and just use that to compute the fixed-size records.

That said, it might make for easier debugging if the reftables file
declares the size it assumes.

-Peff