On Tue, Jun 23, 2015 at 6:47 PM, Jeff King <peff@xxxxxxxx> wrote: > I was thinking of actually moving to a log-structured ref storage. > Something like: > > - any ref write puts a line at the end of a single logfile that > contains the ref name, along with the normal reflog data > > - the logfile is the source of truth for the ref state. If you want to > know the value of any ref, you can read it backwards to find the > last entry for the ref. Everything else is an optimization. > > Let's call the number of refs N, and the number of ref updates in > the log U. > > - we keep a key/value index mapping the name of any branch that exists > to the byte offset of its entry in the logfile. This would probably One key/value mapping per branch, pointing to the latest reflog entry, or one key/valye mapping for each reflog entry? > be in some binary key/value store (like LMDB). Without this, > resolving a ref is O(U), which is horrible. With it, it should be > O(1) or O(lg N), depending on the index data structure. I'm thinking of the user with small or medium repos, in terms of refs, who does not want an extra dependency. If we store one mapping per branch, then the size of this mapping is small enough that the index in a text file is ok. If we also store the offset to the previous reflog entry of the same branch in the current reflog entry, like a back pointer, then we could jump back faster. Or do you have something else in mind? Current reflog structure won't work because I think you bring back the reflog graveyard with this, and I don't want to lose that > - the index can also contain other optimizations. E.g., rather than > point to the entry in the logfile, it can include the sha1 directly > (to avoid an extra level of indirection). It may want to include the > "peeled" value, as the current packed-refs file does. > > - Reading all of the reflogs (e.g., for repacking) is O(U), just like > it is today. Except the storage for the logfile is a lot more > compact than what we store today, with one reflog per file. > > - Reading a single reflog is _also_ O(U), which is not as good as > today. But if each log entry contains a byte offset of the previous > entry, you can follow the chain (it is still slightly worse, because > you are jumping all over the file, rather than reading a compact set > of lines). > > - Pruning the reflog entries from the logfile requires rewriting the > whole thing. That's similar to today, where we rewrite each of the > reflog files. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html