Re: [PATCH v7 00/11] [GSoC] Implement Corrected Commit Date

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/2021 1:58 AM, Abhishek Kumar via GitGitGadget wrote:

> Changes in version 7:
> 
>  * Moved the documentation patch ahead of "commit-graph: implement corrected
>    commit date" and elaborated on the introduction of generation number v2.

The only change in this version is this commit message:

>  11:  e571f03d8bd !  7:  8647b5d2e38 doc: add corrected commit date info
>      @@ Metadata
>       Author: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
>       
>        ## Commit message ##
>      -    doc: add corrected commit date info
>      +    commit-graph: document generation number v2
>       
>      -    With generation data chunk and corrected commit dates implemented, let's
>      -    update the technical documentation for commit-graph.
>      +    Git uses topological levels in the commit-graph file for commit-graph
>      +    traversal operations like 'git log --graph'. Unfortunately, topological
>      +    levels can perform worse than committer date when parents of a commit
>      +    differ greatly in generation numbers [1]. For example, 'git merge-base
>      +    v4.8 v4.9' on the Linux repository walks 635,579 commits using
>      +    topological levels and walks 167,468 using committer date. Since
>      +    091f4cf3 (commit: don't use generation numbers if not needed,
>      +    2018-08-30), 'git merge-base' uses committer date heuristic unless there
>      +    is a cutoff because of the performance hit.
>      +
>      +    [1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@xxxxxxxxx/
>      +
>      +    Thus, the need for generation number v2 was born. As Git used to die
>      +    when graph version understood by it and in the commit-graph file are
>      +    different [2], we needed a way to distinguish between the old and new
>      +    generation number without incrementing the graph version.
>      +
>      +    [2] https://lore.kernel.org/git/87a7gdspo4.fsf@xxxxxxxxxxxxxxxxxxx/
>      +
>      +    The following candidates were proposed (https://github.com/derrickstolee/gen-test,
>      +    https://github.com/abhishekkumar2718/git/pull/1):
>      +    - (Epoch, Date) Pairs.
>      +    - Maximum Generation Numbers.
>      +    - Corrected Commit Date.
>      +    - FELINE Index.
>      +    - Corrected Commit Date with Monotonically Increasing Offsets.
>      +
>      +    Based on performance, local computability, and immutability (along with
>      +    the introduction of an additional commit-graph chunk which relieved the
>      +    requirement of backwards-compatibility) Corrected Commit Date was chosen
>      +    as generation number v2 and is defined as follows:
>      +
>      +    For a commit C, let its corrected commit date  be the maximum of the
>      +    commit date of C and the corrected commit dates of its parents plus 1.
>      +    Then corrected commit date offset is the difference between corrected
>      +    commit date of C and commit date of C. As a special case, a root commit
>      +    with the timestamp zero has corrected commit date of 1 to distinguish it
>      +    from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number).
>      +
>      +    While it was proposed initially to store corrected commit date offsets
>      +    within Commit Data Chunk, storing the offsets in a new chunk did not
>      +    affect the performance measurably. The new chunk is "Generation DATa
>      +    (GDAT) chunk" and it stores corrected commit date offsets while CDAT
>      +    chunk stores topological level. The old versions of Git would ignore
>      +    GDAT chunk, using topological levels from CDAT chunk. In contrast, new
>      +    versions of Git would use corrected commit dates, falling back to
>      +    topological level if the generation data chunk is absent in the
>      +    commit-graph file.
>      +
>      +    While storing corrected commit date offsets saves us 4 bytes per commit
>      +    (as compared with storing corrected commit dates directly), it's however
>      +    possible for the offset to overflow the space allocated. To handle such
>      +    cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that
>      +    stores the corrected commit date. For overflowing offsets, we set MSB
>      +    and store the position into the GDOV chunk, in a mechanism similar to
>      +    the Extra Edges list chunk.
>      +
>      +    For mixed generation number environment (for example new Git on the
>      +    command line, old Git used by GUI client), we can encounter a
>      +    mixed-chain commit-graph (a commit-graph chain where some of split
>      +    commit-graph files have GDAT chunk and others do not). As backward
>      +    compatibility is one of the goals, we can define the following behavior:
>      +
>      +    While reading a mixed-chain commit-graph version, we fall back on
>      +    topological levels as corrected commit dates and topological levels
>      +    cannot be compared directly.
>      +
>      +    When adding new layer to the split commit-graph file, and when merging
>      +    some or all layers (replacing them in the latter case), the new layer
>      +    will have GDAT chunk if and only if in the final result there would be
>      +    no layer without GDAT chunk just below it.

While that is a quality message, v6 has landed in 'next' and I've begun
working off of that version. As Taylor attempted to say [1], this topic
should be considered final and updates should be follow-ups on top.

[1] https://lore.kernel.org/git/YBYLwpKdUfxCNwaz@nand.local/

(Of course, if Junio says differently, then listen to him.)

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux