Re: [PATCH v6 11/11] doc: add corrected commit date info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 27, 2021 at 01:04:54AM +0100, SZEDER Gábor wrote:
> On Sat, Jan 16, 2021 at 06:11:18PM +0000, Abhishek Kumar via GitGitGadget wrote:
> > With generation data chunk and corrected commit dates implemented, let's
> > update the technical documentation for commit-graph.
> 
> This patch should come much earlier in this series, before patch 07/11
> (commit-graph: implement corrected commit date), or perhaps even
> earlier.  That way if someone were to investigate an issue in this
> series and checks out one of its commits, then the specification and
> the will be right there under 'Documentation/technical/'.
> 
> Furthermore, a patch introducing a new chunk format is the right place
> to justify the introduction of said new chunk.  What problems does a
> chunk of corrected commit dates solve?  Why does it solve them?  Why
> do we need corrected commit dates instead of simple commit dates?
> What alternatives were considered [1]?  Any other design considerations
> worth mentioning for the benefit of future readers?
> 
> None of the patches' log messages properly explain these, and while
> much of these is indeed explained in the cover letter, the cover
> letter will not be part of the history.  Requiring to look up mailing
> list archives for the justification puts unnecessary burden on other
> developers who might get interested in this feature in the future.
> 
> You might want to take
> https://public-inbox.org/git/20200529085038.26008-16-szeder.dev@xxxxxxxxx/
> as an inspiration.
> 

Alright, the suggestion makes a lot of sense and the patch introducing
documentation is the perfect place to justify the introduction of new
chunk format.

> 
> [1] Please remember the following snippet from SubmittingPatches:
>     "Try to make sure your explanation can be understood without
>     external resources. Instead of giving a URL to a mailing list
>     archive, summarize the relevant points of the discussion."
> 
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > ---
> >  .../technical/commit-graph-format.txt         | 28 +++++--
> >  Documentation/technical/commit-graph.txt      | 77 +++++++++++++++----
> >  2 files changed, 86 insertions(+), 19 deletions(-)
> > 
> > diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> > index b3b58880b92..b6658eff188 100644
> > --- a/Documentation/technical/commit-graph-format.txt
> > +++ b/Documentation/technical/commit-graph-format.txt
> > @@ -4,11 +4,7 @@ Git commit graph format
> >  The Git commit graph stores a list of commit OIDs and some associated
> >  metadata, including:
> >  
> > -- The generation number of the commit. Commits with no parents have
> > -  generation number 1; commits with parents have generation number
> > -  one more than the maximum generation number of its parents. We
> > -  reserve zero as special, and can be used to mark a generation
> > -  number invalid or as "not computed".
> > +- The generation number of the commit.
> >  
> >  - The root tree OID.
> >  
> > @@ -86,13 +82,33 @@ CHUNK DATA:
> >        position. If there are more than two parents, the second value
> >        has its most-significant bit on and the other bits store an array
> >        position into the Extra Edge List chunk.
> > -    * The next 8 bytes store the generation number of the commit and
> > +    * The next 8 bytes store the topological level (generation number v1)
> > +      of the commit and
> >        the commit time in seconds since EPOCH. The generation number
> >        uses the higher 30 bits of the first 4 bytes, while the commit
> >        time uses the 32 bits of the second 4 bytes, along with the lowest
> >        2 bits of the lowest byte, storing the 33rd and 34th bit of the
> >        commit time.
> >  
> > +  Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
> > +    * This list of 4-byte values store corrected commit date offsets for the
> > +      commits, arranged in the same order as commit data chunk.
> > +    * If the corrected commit date offset cannot be stored within 31 bits,
> > +      the value has its most-significant bit on and the other bits store
> > +      the position of corrected commit date into the Generation Data Overflow
> > +      chunk.
> > +    * Generation Data chunk is present only when commit-graph file is written
> > +      by compatible versions of Git and in case of split commit-graph chains,
> > +      the topmost layer also has Generation Data chunk.
> > +
> > +  Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
> > +    * This list of 8-byte values stores the corrected commit date offsets
> > +      for commits with corrected commit date offsets that cannot be
> > +      stored within 31 bits.
> > +    * Generation Data Overflow chunk is present only when Generation Data
> > +      chunk is present and atleast one corrected commit date offset cannot
> > +      be stored within 31 bits.
> > +
> >    Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
> >        This list of 4-byte values store the second through nth parents for
> >        all octopus merges. The second parent value in the commit data stores
> > diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
> > index f14a7659aa8..f05e7bda1a9 100644
> > --- a/Documentation/technical/commit-graph.txt
> > +++ b/Documentation/technical/commit-graph.txt
> > @@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
> >  
> >  Values 1-4 satisfy the requirements of parse_commit_gently().
> >  
> > -Define the "generation number" of a commit recursively as follows:
> > +There are two definitions of generation number:
> > +1. Corrected committer dates (generation number v2)
> > +2. Topological levels (generation nummber v1)
> >  
> > - * A commit with no parents (a root commit) has generation number one.
> > +Define "corrected committer date" of a commit recursively as follows:
> >  
> > - * A commit with at least one parent has generation number one more than
> > -   the largest generation number among its parents.
> > + * A commit with no parents (a root commit) has corrected committer date
> > +    equal to its committer date.
> >  
> > -Equivalently, the generation number of a commit A is one more than the
> > + * A commit with at least one parent has corrected committer date equal to
> > +    the maximum of its commiter date and one more than the largest corrected
> > +    committer date among its parents.
> > +
> > + * As a special case, a root commit with timestamp zero has corrected commit
> > +    date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
> > +    (that is, an uncomputed corrected commit date).
> > +
> > +Define the "topological level" of a commit recursively as follows:
> > +
> > + * A commit with no parents (a root commit) has topological level of one.
> > +
> > + * A commit with at least one parent has topological level one more than
> > +   the largest topological level among its parents.
> > +
> > +Equivalently, the topological level of a commit A is one more than the
> >  length of a longest path from A to a root commit. The recursive definition
> >  is easier to use for computation and observing the following property:
> >  
> > @@ -60,6 +77,9 @@ is easier to use for computation and observing the following property:
> >      generation numbers, then we always expand the boundary commit with highest
> >      generation number and can easily detect the stopping condition.
> >  
> > +The property applies to both versions of generation number, that is both
> > +corrected committer dates and topological levels.
> > +
> >  This property can be used to significantly reduce the time it takes to
> >  walk commits and determine topological relationships. Without generation
> >  numbers, the general heuristic is the following:
> > @@ -67,7 +87,9 @@ numbers, the general heuristic is the following:
> >      If A and B are commits with commit time X and Y, respectively, and
> >      X < Y, then A _probably_ cannot reach B.
> >  
> > -This heuristic is currently used whenever the computation is allowed to
> > +In absence of corrected commit dates (for example, old versions of Git or
> > +mixed generation graph chains),
> > +this heuristic is currently used whenever the computation is allowed to
> >  violate topological relationships due to clock skew (such as "git log"
> >  with default order), but is not used when the topological order is
> >  required (such as merge base calculations, "git log --graph").
> > @@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite"
> >  generation number and walk until reaching commits with known generation
> >  number.
> >  
> > -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
> > +We use the macro GENERATION_NUMBER_INFINITY to mark commits not
> >  in the commit-graph file. If a commit-graph file was written by a version
> >  of Git that did not compute generation numbers, then those commits will
> >  have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
> > @@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in
> >  walking a few extra commits, but the simplicity in dealing with commits
> >  with generation number *_INFINITY or *_ZERO is valuable.
> >  
> > -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
> > -generation numbers are computed to be at least this value. We limit at
> > -this value since it is the largest value that can be stored in the
> > -commit-graph file using the 30 bits available to generation numbers. This
> > -presents another case where a commit can have generation number equal to
> > -that of a parent.
> > +We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose
> > +topological levels (generation number v1) are computed to be at least
> > +this value. We limit at this value since it is the largest value that
> > +can be stored in the commit-graph file using the 30 bits available
> > +to topological levels. This presents another case where a commit can
> > +have generation number equal to that of a parent.
> >  
> >  Design Details
> >  --------------
> > @@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
> >  number of commits) could be extracted into config settings for full
> >  flexibility.
> >  
> > +## Handling Mixed Generation Number Chains
> > +
> > +With the introduction of generation number v2 and generation data chunk, the
> > +following scenario is possible:
> > +
> > +1. "New" Git writes a commit-graph with the corrected commit dates.
> > +2. "Old" Git writes a split commit-graph on top without corrected commit dates.
> > +
> > +A naive approach of using the newest available generation number from
> > +each layer would lead to violated expectations: the lower layer would
> > +use corrected commit dates which are much larger than the topological
> > +levels of the higher layer. For this reason, Git inspects the topmost
> > +layer to see if the layer is missing corrected commit dates. In such a case
> > +Git only uses topological level for generation numbers.
> > +
> > +When writing a new layer in split commit-graph, we write corrected commit
> > +dates if the topmost layer has corrected commit dates written. This
> > +guarantees that if a layer has corrected commit dates, all lower layers
> > +must have corrected commit dates as well.
> > +
> > +When merging layers, we do not consider whether the merged layers had corrected
> > +commit dates. Instead, the new layer will have corrected commit dates if the
> > +layer below the new layer has corrected commit dates.
> > +
> > +While writing or merging layers, if the new layer is the only layer, it will
> > +have corrected commit dates when written by compatible versions of Git. Thus,
> > +rewriting split commit-graph as a single file (`--split=replace`) creates a
> > +single layer with corrected commit dates.
> > +
> >  ## Deleting graph-{hash} files
> >  
> >  After a new tip file is written, some `graph-{hash}` files may no longer
> > -- 
> > gitgitgadget

Thanks
- Abhishek



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux