On Thu, Jan 25, 2018 at 9:02 PM, Derrick Stolee <stolee@xxxxxxxxx> wrote: > +Git walks the commit graph for many reasons, including: > + > +1. Listing and filtering commit history. > +2. Computing merge bases. > + > +These operations can become slow as the commit count grows above 100K. > +The merge base calculation shows up in many user-facing commands, such > +as 'status' and 'fetch' and can take minutes to compute depending on > +data shape. There are two main costs here: > + > +1. Decompressing and parsing commits. > +2. Walking the entire graph to avoid topological order mistakes. > + > +The packed graph is a file that stores the commit graph structure along > +with some extra metadata to speed up graph walks. This format allows a > +consumer to load the following info for a commit: > + > +1. The commit OID. > +2. The list of parents. > +3. The commit date. > +4. The root tree OID. > +5. An integer ID for fast lookups in the graph. > +6. The generation number (see definition below). I didn't look closely to compare, but perhaps you should check out pack file format version 4 [1]. It tried to address the same thing but it never got to the point where we could replace our current pack format with it. At some point I wanted to push it even as a local optimization (pack transfer is still in old format) but I never had enough time or energy for it. How it stores commits though can probably be reused. [1] https://github.com/pclouds/git/commit/23cb8ae5bdd968c1a290ff8d0fd7cb6b4d572a43 -- Duy