Re: RefTree: Alternate ref backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 17, 2015 at 2:10 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Thu, Dec 17, 2015 at 01:02:50PM -0800, Shawn Pearce wrote:
>
>> I started playing around with the idea of storing references directly
>> in Git. Exploiting the GITLINK tree entry, we can associate a name to
>> any SHA-1.
>
> Gitlink entries don't imply reachability, though. I guess that doesn't
> matter if your ref backend says "no, really, these are the ref tips, and
> they are reachable".

Exactly. This works with existing JGit because it swaps out the ref
backend. When GC tries to enumerate the roots (current refs), it gets
these through the ref backend by scanning the tree recursively. The
packer itself doesn't care where those roots came from.

Same would be true for any other pluggable ref backend in git-core. GC
has to ask the ref backend, and then trust its reply. How/where that
ref backend tracks that is an implementation detail.

>  But you could not push the whole thing up to
> another server and expect it to hold the whole graph.

Correct, pushing this to another repository doesn't transmit the
graph. If the other repository also used this for its refs backend,
its now corrupt and confused out of its mind. Just like copying the
packed-refs file with scp. Don't do that. :)

> Which is not strictly necessary, but to me seems like the real advantage
> of using git objects versus some other system.

One advantage is you can edit HEAD symref remotely. Commit a different
symlink value and push. :)

I want to say more, but I'm going to hold back right now. There's more
going on in my head than just this.

> Of course, the lack of reachability has advantages, too. You can
> drop commits pointed to by old reflogs without rewriting the ref
> history.

Yes.

> Unfortunately you cannot expunge the reflogs at all. That's
> good if you like audit trails. Bad if you are worried that your reflogs
> will grow large. :)

At present our servers do not truncate their reflogs. Yes some are... big.

I considered truncating this graph by just using a shallow marker. Add
a shallow entry and repack. The ancient history will eventually be
garbage collected and disappear.

One advantage of this format is deleted branches can retain a reflog
post deletion. Another is you can trivially copy the reflog using
native Git to another system for backup purposes. Or fetch it over the
network to inspect locally. So a shared group server could be
exporting its reflog, you can fetch it and review locally what
happened to branches without logging into the shared server.

So long as you remember that copying the reflog doesn't mean you
actually copied the commit histories, its works nicely.

Another advantage of this format over LMDB or TDB or whatever is Git
already understands it. The tools already understand it. Plumbing can
inspect and repair things. You can reflog the reflog using traditional
reflog ($GIT_DIR/reflogs/refs/txn/committed).

>> By storing all references in a single tree, atomic transactions are
>> possible. Its a simple compare-and-swap of a single 40 byte SHA-1.
>> This of course leads to a bootstrapping problem, where do we store the
>> 40 byte SHA-1? For this example its just $GIT_DIR/refs/txn/committed
>> as a classical loose reference.
>
> Somehow putting it inside `refs/` seems weird to me, in an infinite
> recursion kind of way.  I would have picked $GIT_DIR/REFSTREE or
> something. But that is a minor point.

I had started with $GIT_DIR/REFS, but see above. I have more going on
in my head. This is only a tiny building block.

>> Configuration:
>>
>>   [core]
>>     repositoryformatversion = 1
>>   [extensions]
>>     refsBackendType = RefTree
>
> The semantics of extensions config keys are open-ended. The
> formatVersion=1 spec only says "if there is a key you don't know about,
> then you may not proceed". Now we're defining a refsBackendType
> extension. It probably makes sense to write up a few rules (e.g., is
> RefTree case-sensitive?).

In my prototype in JGIt I parse it as case insensitive, but used
CamelCase because the JavaClassNameIsNamedThatWayBecauseJava.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]