Re: [PATCH 01/10] technical doc: add a design doc for the evolve command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Stefan Xenos via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Stefan Xenos <sxenos@xxxxxxxxxx>

[the above address should bounce, and has been removed from cc: list]

> +Background
> +==========
> +Imagine you have three sequential changes up for review and you receive feedback
> +that requires editing all three changes. We'll define the word "change"
> +formally later, but for the moment let's say that a change is a work-in-progress
> +whose final version will be submitted as a commit in the future.
> +
> +While you're editing one change, more feedback arrives on one of the others.
> +What do you do?
> +
> +The evolve command is a convenient way to work with chains of commits that are
> +under review. Whenever you rebase or amend a commit, the repository remembers
> +that the old commit is obsolete and has been replaced by the new one. Then, at
> +some point in the future, you can run "git evolve" and the correct sequence of
> +rebases will occur in the correct order such that no commit has an obsolete
> +parent.
> +
> +Part of making the "evolve" command work involves tracking the edits to a commit
> +over time, which is why we need an change graph. However, the change
> +graph will also bring other benefits:

It would be assuring to hear that "change graph" will also be
defined and explained formally later, just like "change" will in the
previous paragraph.  We will later see mention of "metacommits" and
"meta-commits" in this document, and I am guessing both of them are
quasi-synonyms to "change graph". If that is true, it is better to
stick to a single terminology.

> +Goals
> +-----
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +    rebase command) should mark the old commit as being obsolete and replaced by
> +    the new one. No additional commands should be required to keep the
> +    change graph up-to-date.
> +P0. Any commit that may be involved in a future evolve command should not be
> +    garbage collected. Specifically:
> +    - Commits that obsolete another should not be garbage collected until
> +      user-specified conditions have occurred and the change has expired from
> +      the reflog. User specified conditions for removing changes include:
> +      - The user explicitly deleted the change.
> +      - The change was merged into a specific branch.
> +    - Commits that have been obsoleted by another should not be garbage
> +      collected if any of their replacements are still being retained.
> +P0. A commit can be obsoleted by more than one replacement (called divergence).
> +P0. Users must be able to resolve divergence (convergence).

P0: a single parent commit should keep only one parent. IOW, the
"change graph" implementation should not contaminate the end-result
commit in the regular part of the history, which is the product of
the final iteration of a "change"

IOW ...

> +P2. It should be possible to discard part or all of the change graph
> +    without discarding the commits themselves that are already present in
> +    branches and the reflog.

... this item should be P0.

> +Overview
> +========
> +We introduce the notion of “meta-commits” which describe how one commit was

Random appearance of smart quotes are annoying.  We'll be formatting
the doc via AsciiDoc, so let's stick to vanilla double or single quotes.

> +created from other commits. A branch of meta-commits is known as a change.
> +Changes are created and updated automatically whenever a user runs a command
> +that creates a commit. They are used for locating obsolete commits, providing a
> +list of a user’s unsubmitted work in progress, and providing a stable name for
> +each unsubmitted change.

Can "change graph" also be defined and explained here, too?  Or if
it is pretty much a synonym to "a branch of meta-commits", then
perhaps the document does not have to introduce the term "change
graph" and still stay understandable?

> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@xxxxxxxxx> 1540841596 -0700
> +committer Stefan Xenos <sxenos@xxxxxxxxx> 1540841596 -0700
> +parent-type c r o
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree, but future versions of git
> +may attach other trees here. For forward-compatibility fsck should ignore such
> +trees if found on future repository versions. This will allow future versions of
> +git to add metadata to the meta-commit tree without breaking forwards
> +compatibility.

Not clear why "trees" need to be ignored only by fsck but not others
like fetch/push, and I'd strongly advise against making such a
special case.  If they are missing, they are missing and should be
reported as corruptoin, and if you do not like it, do not add a
missing tree.

> +Parent-type
> +-----------
> +The “parent-type” field in the commit header identifies a commit as a
> +meta-commit and indicates the meaning for each of its parents. It is never
> +present for normal commits. It contains a space-deliminated list of enum values
> +whose order matches the order of the parents. Possible parent types are:

> +- c: (content) the content parent identifies the commit that this meta-commit is
> +  describing.
> +- r: (replaced) indicates that this parent is made obsolete by the content
> +  parent.
> +- o: (origin) indicates that the content parent was generated by cherry-picking
> +  this parent.
> +- a: (abandoned) used in place of a content parent for abandoned changes. Points
> +  to the final content commit for the change at the time it was abandoned.

Don't be cute with parent-type using single letters. You'll thank me
later when you need two types that share the first letter.

> +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> +single origin parent. Certain types of squash merge will create multiple origin
> +parents. Origin parents don't directly cause their origin to become obsolete,
> +but are used when computing blame or locating a merge base. The section
> +on obsolescence over cherry-picks describes how the evolve command uses
> +origin parents.

Should it make a difference among doing these operations?

 - running "commit --amend" after "cherry-pick --no-commit" possibly with editing

 - running "commit --amend" after manually editing the same way, and

 - running "commit --amend" after "cherry-pick", possibly with editing?

It seems that the first two will not be captured while the last one
leaves 'origin'.  What should happen after running "commit --amend"
after "apply --index" possibly with editing?

What's the point of giving 'origin' only for "cherry-pick" and
squash merge?  I am wondering if we want to record contributions
sourced from an e-mailed patch from elsewhere (currently people use
external services like patchwork to do this)?

For the purpose of discussing "evolve", should "rebase" (with or
without "-i") be treated pretty much the same as a series of
"cherry-pick" mixed with "commit --amend" (possibly preceded with a
manual edit), followed by finally replacing the tip of the branch?
In the end result, the replaced commits after a "rebase" become
accessible only from reflog, but other than that, these two bulk
transplanting operations shouldn't be all that different.

> +The parent-type field needs to go after the committer field since git's rules
> +for forwards-compatibility require that new fields to be at the end of the
> +header. Putting a new field in the middle of the header would break fsck.

You can do without introducing a new header to avoid compatibility
issue by recording the information in the body of the commit object,
which would be even cleaner.

> +Change deletion
> +---------------
> +Changes are normally only interesting to a user while a commit is still in
> +development and under review. Once the commit has submitted wherever it is
> +going, its change can be discarded.
> +
> +The normal way of deleting changes makes this easy to do - changes are deleted
> +by the evolve command when it detects that the change is present in an upstream
> +branch. It does this in two ways: if the latest commit in a change either shows
> +up in the branch history or the change becomes empty after a rebase, it is
> +considered merged and the change is discarded. In this context, an “upstream
> +branch” is any branch passed in as the upstream argument of the evolve command.
> +
> +In case this sometimes deletes a useful change, such automatic deletions are
> +recorded in the reflog allowing them to be easily recovered.

Deleting a useful change is recorded in the reflog?  Isn't a change
recorded as a ref in metas/ hierarchy? Doesn't the removal of such a
ref remove its reflog as well?

I guess the above silly questions come from the fact that the
document does not make it clear reflog of what ref it is recorded.

> +Modify commands
> +---------------
> +Modification commands (commit --amend, rebase) will mark the old commit as
> +obsolete by creating a new meta-commit that references the old one as a
> +replaced parent. In the event that multiple changes point to the same commit,
> +this is done independently for every such change.
> +
> +More specifically, modifications work like this:
> +
> +1. Locate all existing changes for which the old commit is the content for the
> +   head of the change branch. If no such branch exists, create one that points
> +   to the old commit. Changes that include this commit in their history but not
> +   at their head are explicitly not included.
> +2. For every such change, create a new meta-commit that references the new
> +   commit as its content and references the old head of the change as a
> +   replaced parent.
> +3. Move the change branch forward to point to the new meta-commit.
> +
> +Copy commands
> +-------------
> +Copy commands (cherry-pick, merge --squash) create a new meta-commit that
> +references the old commits as origin parents. Besides the fact that the new
> +parents are tagged differently, copy commands work the same way as modify
> +commands.

It is unclear what benefit we will get by separating "Copy" commands
from "Modify" commands.  "checkout A && cherry-pick B" may make a
new copy of the edit the commit at the tip of branch B wanted to
make at the tip of branch A, but "commit --amend" is the same, in
that it makes a new copy of the edit the commit at the tip of the
current branch wanted to make, and the original copy is available in
both cases. It is just that the original of "cherry-pick B" is
slightly easier to access (i.e. it is still at the tip of branch B,
until the branch gains new commits on top of it) than the original
of "commit --amend" (i.e. the user needs to know that @{1} is the
previous state). Shouldn't all commands that create a new commit
object using some existing material (i.e. not from scratch) be
treated equally, without splitting them into two camps?

IOW, the above explains that the new parents are tagged differently,
but it does not explain why it is a good idea to do so.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux