On Thu, Dec 17, 2015 at 2:10 PM, Jeff King <peff@xxxxxxxx> wrote: > On Thu, Dec 17, 2015 at 01:02:50PM -0800, Shawn Pearce wrote: > >> I started playing around with the idea of storing references directly >> in Git. Exploiting the GITLINK tree entry, we can associate a name to >> any SHA-1. > > Gitlink entries don't imply reachability, though. I guess that doesn't > matter if your ref backend says "no, really, these are the ref tips, and > they are reachable". Exactly. This works with existing JGit because it swaps out the ref backend. When GC tries to enumerate the roots (current refs), it gets these through the ref backend by scanning the tree recursively. The packer itself doesn't care where those roots came from. Same would be true for any other pluggable ref backend in git-core. GC has to ask the ref backend, and then trust its reply. How/where that ref backend tracks that is an implementation detail. > But you could not push the whole thing up to > another server and expect it to hold the whole graph. Correct, pushing this to another repository doesn't transmit the graph. If the other repository also used this for its refs backend, its now corrupt and confused out of its mind. Just like copying the packed-refs file with scp. Don't do that. :) > Which is not strictly necessary, but to me seems like the real advantage > of using git objects versus some other system. One advantage is you can edit HEAD symref remotely. Commit a different symlink value and push. :) I want to say more, but I'm going to hold back right now. There's more going on in my head than just this. > Of course, the lack of reachability has advantages, too. You can > drop commits pointed to by old reflogs without rewriting the ref > history. Yes. > Unfortunately you cannot expunge the reflogs at all. That's > good if you like audit trails. Bad if you are worried that your reflogs > will grow large. :) At present our servers do not truncate their reflogs. Yes some are... big. I considered truncating this graph by just using a shallow marker. Add a shallow entry and repack. The ancient history will eventually be garbage collected and disappear. One advantage of this format is deleted branches can retain a reflog post deletion. Another is you can trivially copy the reflog using native Git to another system for backup purposes. Or fetch it over the network to inspect locally. So a shared group server could be exporting its reflog, you can fetch it and review locally what happened to branches without logging into the shared server. So long as you remember that copying the reflog doesn't mean you actually copied the commit histories, its works nicely. Another advantage of this format over LMDB or TDB or whatever is Git already understands it. The tools already understand it. Plumbing can inspect and repair things. You can reflog the reflog using traditional reflog ($GIT_DIR/reflogs/refs/txn/committed). >> By storing all references in a single tree, atomic transactions are >> possible. Its a simple compare-and-swap of a single 40 byte SHA-1. >> This of course leads to a bootstrapping problem, where do we store the >> 40 byte SHA-1? For this example its just $GIT_DIR/refs/txn/committed >> as a classical loose reference. > > Somehow putting it inside `refs/` seems weird to me, in an infinite > recursion kind of way. I would have picked $GIT_DIR/REFSTREE or > something. But that is a minor point. I had started with $GIT_DIR/REFS, but see above. I have more going on in my head. This is only a tiny building block. >> Configuration: >> >> [core] >> repositoryformatversion = 1 >> [extensions] >> refsBackendType = RefTree > > The semantics of extensions config keys are open-ended. The > formatVersion=1 spec only says "if there is a key you don't know about, > then you may not proceed". Now we're defining a refsBackendType > extension. It probably makes sense to write up a few rules (e.g., is > RefTree case-sensitive?). In my prototype in JGIt I parse it as case insensitive, but used CamelCase because the JavaClassNameIsNamedThatWayBecauseJava. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html