Re: DEVEL: Help with feature implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/18/2021 7:54 PM, Antonio Russo wrote:
> On 1/18/21 1:58 PM, Derrick Stolee wrote:
>> On 1/18/2021 2:31 PM, Aiyee Bee wrote:
>>> Hi Antonio and Derrick!
>>>
>>>> I think what you really want is --full-history --simplify-merges [1]. This
>>>> will show the merges that "fork" the history into parallel tracks where
>>>> at least two of them contain interesting commits.
>>>
>>> It doesn't look like the implementation of --simplify-merges helps much
>>> here. That makes its decision on basis of the parents of the commit, which is
>>> simple to do as it's information attached freely to each commit. I think the
>>> problem here would be figuring out, given any commit, how many of its children
>>> are "relevant" commits.
>>
>> You should definitely give this a try instead of assuming things about the
>> implementation. The algorithm uses a lot of "simplifying" that makes it look
>> like the decision is a local one. However, I assure you that is not the case.
> 
> As a side note, would this list be willing to look at patches that remove
> the need to use revs->limited?  Adding new features would be much easier if
> we could restrict git to use streaming algorithms for these simplifications.

I would _love_ to see patches that remove that bit (without modifying
the behavior).

Fair warning: I definitely spent a few weeks attempting to do any amount
of reducing the depth one needs to walk in order to compute the
--simplify-merges history, but a sufficiently-complicated branch history
makes it nearly impossible to gain a benefit.

>> Please assemble a test case that demonstrates the behavior you want and how
>> that is different from what is present in --simplify-merges.
> 
> I can't figure out how to get the behavior from --simplify-merges, which is
> described as
> 
> 	Additional option to --full-history to remove some needless
> 	merges from  the resulting history, as there are no selected
> 	commits contributing to this merge.
> 
> It seems that the desired behavior is to include commits which are parents to
> multiple branches.  Here is an example:

Thank you for these examples. They clearly show that I misread your
ask, because you're not looking for "merge commits" but instead you
are looking to show the "merge bases" as history is walking.

Sorry for misinterpreting your request, then doubling down on it.

> test_commit() {
>  echo >> file
>  git add file
>  git commit "$@"
> }
> 
> git init
> test_commit -m a
> test_commit -m b
> test_commit -m c
> git checkout -b fork
> test_commit -m y
> test_commit -m z
> git switch master
> test_commit -m d
> test_commit -m e
> test_commit -m f
> 
> git log --graph --oneline master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git log --graph --oneline --full-history --simplify-merges master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git log --graph --oneline --simplify-by-decoration --full-history --simplify-merges master fork
> 
> * 08029fd f
> | * efc204e z
> |/  
> * bd44220 a
> 
> git log --graph --oneline --full-history --simplify-merges master fork
> 
> * 08029fd f
> * 55b09fe e
> * 83b7801 d
> | * efc204e z
> | * 316219e y
> |/  
> * 3594039 c
> * 4321987 b
> * bd44220 a
> 
> git --version
> git version 2.30.0
> 
> I can't seem to get commit c, the crucial fork, to show up with simplifications with this mechanism.
> Am I missing something here?

In your example, you are not specifying a path. In this case, you are
really looking for "git merge-base master fork". You could also use
"git log --boundary master...fork" to show everything up to and
including 'c'.

Now, if you specify a pathspec, then 'git merge-base' isn't going to
help. That becomes a technically interesting problem.

The biggest reason that "git log" doesn't show this commit 'c' easily
is because...it's not really that important. When that commit was
created, it didn't "know" that it would be a common base of two
diverging branches. By surfacing the commit, we are very unlikely to
present the user with information that is helpful.

One way to expose the "first layer" of these merge bases is with the
--boundary option.

Here is an example of two branches in the Git ecosystem that can
present an interesting real example: code in builtin/gc.c has been
changed in both 'master' from origin (https://github.com/git/git) and
in 'main' from windows (https://github.com/git-for-windows/git):

$ git log --no-decorate --oneline --graph origin/master...windows/main  -- builtin/gc.c
* b2ace18759c Merge branch 'ds/maintenance-part-4'
* c977ff44073 Merge branch 'pk/subsub-fetch-fix-take-2'
*   95124ead43a Merge branch 'main' into maintenance-and-headless
|\  
| * ff61c9b1f49 Start the merging-rebase to v2.30.0-rc2
| * afb16063390 Start the merging-rebase to v2.30.0-rc0
| * 13c9d54a45b Start the merging-rebase to v2.29.0-rc0
| * c00a524ea88 strvec: rename struct fields
| * a864a84f1ee strvec: fix indentation in renamed calls
| * 12922aa02db strvec: convert builtin/ callers away from argv_array name
| * 86f1b0f0e04 strvec: rename files from argv-array to strvec
* 5bef0904826 git maintenance: avoid console window in scheduled tasks on Windows

Something that is hard to see from this picture is that the first two
commits are not actually connected to the third. This is more visible
when adding --boundary:

$ git log --no-decorate --oneline --graph --boundary origin/master...windows/main -- builtin/gc.c
*   b2ace18759c Merge branch 'ds/maintenance-part-4'
|\  
* \   c977ff44073 Merge branch 'pk/subsub-fetch-fix-take-2'
|\ \  
| | | *   95124ead43a Merge branch 'main' into maintenance-and-headless
| | | |\  
| | | | *   ff61c9b1f49 Start the merging-rebase to v2.30.0-rc2
| | | | |\  
| | | | | *   afb16063390 Start the merging-rebase to v2.30.0-rc0
| | | | | |\  
| | | | | | *   13c9d54a45b Start the merging-rebase to v2.29.0-rc0
| | | | | | |\  
| | | | | | | * c00a524ea88 strvec: rename struct fields
| | | | | | | * a864a84f1ee strvec: fix indentation in renamed calls
| | | | | | | * 12922aa02db strvec: convert builtin/ callers away from argv_array name
| | | | | | | * 86f1b0f0e04 strvec: rename files from argv-array to strvec
| | | * | | | | 5bef0904826 git maintenance: avoid console window in scheduled tasks on Windows
| | |/ / / / /  
| | | | | | o 47ae905ffb9 Git 2.28
| | | | | o d98273ba77e Git 2.29-rc0
| | | | o 1c52ecf4ba0 Git 2.30-rc0
| | | o 4a0de43f492 Git 2.30-rc2
| o | 898f80736c7 Git 2.29.2
|  /  
o / 71ca53e8125 Git 2.30
 /  
o 3797a0a7b7a maintenance: use Windows scheduled tasks

Notice that all these boundaries are actually adding "uninteresting"
bases _and_ merges.

Probably, the suggested --show-forkpoints would need to start from
something like this and simplify it further.

The problem that I see as well is what happens as the history itself
branches repeatedly. Consider the history of builtin/gc.c again, this
time with a single starting ref. (I have inserted places where your
fork points might add commits to the first few results.)

$ git log --no-decorate --oneline --graph origin/master -- builtin/gc.c
*   b2ace18759c Merge branch 'ds/maintenance-part-4'
|\  
| * 3797a0a7b7a maintenance: use Windows scheduled tasks
| * 2afe7e35672 maintenance: use launchctl on macOS
| * 31345d5545e maintenance: extract platform-specific scheduling
* | 66dc0a3625e gc: fix handling of crontab magic markers
* |   f2a75cb312d Merge branch 'rs/maintenance-run-outside-repo'
|\ \  
| * | e72f7defc4f maintenance: fix SEGFAULT when no repository
* | |   d702cb9e890 Merge branch 'ds/maintenance-part-3'
|\ \ \  
| * | | 483a6d9b5da maintenance: use 'git config --fixed-value'
* | | |   1c04cdd424b Merge branch 'ab/gc-keep-base-option'
|\ \ \ \  
| |_|/ /  
|/| | |   

* | | |  (some fork point here)

| * | | 793c1464d3e gc: rename keep_base_pack variable for --keep-largest-pack
* | | | a1c74791d5f gc: fix cast in compare_tasks_by_selection()
| |/ /  
|/| |   

* | |    (some fork point here)

* | |   7660da16182 Merge branch 'ds/maintenance-part-3'
|\ \ \  
| | |/  
| |/|   

| * |    (some fork point here)

| * | 61f7a383d3b maintenance: use 'incremental' strategy by default
| * | a4cb1a2339c maintenance: create maintenance.strategy config
| * | 2fec604f8df maintenance: add start/stop subcommands
| * | 0c18b700810 maintenance: add [un]register subcommands
| * | b08ff1fee00 maintenance: add --schedule option and config
* | |   902f358555c Merge branch 'rs/clear-commit-marks-in-repo

When using file history simplification, there is actually a very
simple way to detect whether a commit is a fork point (assuming
that you are walking in generation number order):

- As you walk and see that commit C is not TREESAME to commit D,
  increment a "walked indegree" count on D. (Note: this would be
  different from the current "indegree" slab used in topo-order
  calculations.)

- As you visit a commit, check its indegree. If this is larger
  than 1, then it is a forkpoint of two "interesting" histories.

This entire procedure relies on the fact that simplified history
only visits multiple parents if the merge commit is not TREESAME
to _any_ parent. Otherwise, it walks only to its first TREESAME
parent, "simplifying away" the other parents.

To implement your fork-points in the case of --full-history
or --simplify-merges would require something much more sophisticated.

   (taking an extra pause here to shift gears)

But I think the --show-forkpoints option needs to be justified a
bit better. I'm still not convinced that this is actually
interesting information, unless you are literally looking for a
merge base or a log boundary, in which case there are ways to
expose those values.

The example of --show-pulls has such a justification:

  If a user wants to see which merge commits introduced the
  interesting commits into this history, then --show-pulls
  presents those merges. Such merges might have extra
  information in their commit messages, such as pull request
  IDs, topic branch names, or information about tricky merge
  conflict resolutions.

Is there something similar to justify --show-forkpoint?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux