Re: [PATCH 01/14] graph: add packed graph design document

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 25, 2018 at 9:02 PM, Derrick Stolee <stolee@xxxxxxxxx> wrote:
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.
> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:
> +
> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to avoid topological order mistakes.
> +
> +The packed graph is a file that stores the commit graph structure along
> +with some extra metadata to speed up graph walks. This format allows a
> +consumer to load the following info for a commit:
> +
> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).

I didn't look closely to compare, but perhaps you should check out
pack file format version 4 [1]. It tried to address the same thing but
it never got to the point where we could replace our current pack
format with it. At some point I wanted to push it even as a local
optimization (pack transfer is still in old format) but I never had
enough time or energy for it. How it stores commits though can
probably be reused.

[1] https://github.com/pclouds/git/commit/23cb8ae5bdd968c1a290ff8d0fd7cb6b4d572a43
-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux