Re: [PATCH] git-rev-list: add --exclude-path-first-parent flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 17, 2021 at 12:22 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Jerry Zhang <jerry@xxxxxxxxxx> writes:
>
> > On Fri, Apr 16, 2021 at 5:45 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >>
> >> Jerry Zhang <jerry@xxxxxxxxxx> writes:
> >>
> >> > Add the --exclude-path-first-parent flag,
> >> > which works similarly to --first-parent,
> >> > but affects only the graph traversal for
> >> > the set of commits being excluded.
> >> >
> >> >    -A-------E-HEAD
> >> >      \     /
> >> >       B-C-D
> >> >
> >> > In this example, the goal is to return the
> >> > set {B, C, D} which represents a working
> >> > branch that has been merged into main branch
> >> > E. `git rev-list D ^E` will end up returning
> >> > no commits since the exclude path eliminates
> >> > D and its ancestors.
> >> > `git rev-list --exclude-path-first-parent D ^E`
> >> > however will return {B, C, D} as desired.
> >>
> >> It is not clera why you want to have this, instead of doing a more
> >> obvious "D..E^".  Even better is "E^..E", which is often what you
> >> want when viewing a history like my 'seen' that is a straight-line
> >> into which tips of branches are merged.
> > My motivation is to find the point at which a release branch forked off from
> > a main branch, even though the release branch could have been merged
> > into the main branch multiple times since it was forked off.
> >
> > If we add another merge from release to main, it will be more clear
> > that those give different results:
> >
> >         -A-----E-F-main
> >           \   / /
> >            B-C-D-release
> >
> > `git rev-list --exclude-path-first-parent release ^main` returns {B, C, D}.
> > I've added commit F to show that we don't necessarily have info on E,
> > there could be many commits between it and the tip of main.
>
> OK, you meant to deal with repeated merges into integration branch.
>
> So the idea is to just name the end point merge, say F (you also
> could name D as the starting point, but see below), and
>
>  - initially mark its first parent as UNINTERESTING (i.e. E), and
>    other parents as INTERESTING (i.e. D).
>
>  - run the revision traversal machinery, but when propagating the
>    UNINTERESTING bit, give it only to the first parent.  The second
>    and later parents won't become UNINTERESTING.
>
>  - stop after we exhaust INTERESTING commits.
>
> It would probably work for your idealized topology, but I do not
> know what happens when there are criss-cross merges.  In the revised
> picture, you are merging down from the B-C-D chain into the
> mainline, but once the B-C-D chain becomes longer and diverges too
> much from the mainline, it becomes tempting to break the "merge only
> in one direction" discipline and merge back from the mainline, to
> "catch up", and such a merge will have the history of B-C-D line of
> development as its first parent.  Would that screw up the selection
> of which line of development is uninteresting?
Yeah this flag (as well as the --first-parent flag) is mainly only useful
because "git merge" will always put the "branch you're on" as parent 1
and the "branch being merged in" as parent 2. It is possible to break
this assumption with either commit-tree or by merging while on one
branch and pushing to another, but then the user should understand
the consequences of doing so. In our case this isn't possible because
a server handles all merges into the main branches.
>
> >> > Add the --exclude-path-first-parent flag,
> >> > which works similarly to --first-parent,
> >> > but affects only the graph traversal for
> >> > the set of commits being excluded.
> >> >
> >> >    -A-------E-HEAD
> >> >      \     /
> >> >       B-C-D
>
> In any case, it was totally unclear from the proposed log messsage,
> and the overlong option name that does not say much did not help me
> guess what you wanted to do with it.  Specifically, it is not clear
> what "exclude" means (we do not usually use the word in the context
Exclude appears in the first paragraph of the man for git rev-list:
"      List commits that are reachable by following the parent
       links from the given commit(s), but exclude commits that
       are reachable from the one(s) given with a ^ in front of
       them. The output is given in reverse chronological order
       by default."
It appears 5+ more times in the man page with the same meaning.
> of revision traversal), and when we talk about "path" in the context
> of revision traversal, we almost always mean the paths to the files,
> i.e. pathspec that limits and simplifies the shape of the history.
"path" is used in the same man page for the flag "--ancestry-path".
I agree that it could be ambiguous though, so perhaps "chain" would
be better.
> Also, it claims that it works similarly to --first-parent, but what
> you are doing is to propagate UNINTERESTING bit on the first-parent
> chain, which ends up showing the side branch (i.e. B-C-D chain),
> without showing the commits on the first-parent chain (A and E).
>
> What are the words that convey the idea behind this operation
> clearly at the conceptual level?  Let's think aloud to see if we can
> come up with a better name.
>
>  * first parents are unintertesting
>
>  * show commits on side branch(es)
>
>  * follow side branch.
>
> I think that is closer to the problem you are solving, if I
> understand what you wrote above correctly.
>
> Perhaps --show-side-branch or --follow-side-branch?  I dunno.
For my particular use-case I am using it in combination with
--first-parent and a single include and exclude commit to show the
commits on the "side-branch" of the include commit. But if you specify
multiple commits for either or don't use --first-parent, the behavior is
different and I don't think "--side-branch" describes it well in those cases.

Since I don't believe I can predict all use-cases for the flag,
I'd rather name it by what it "does" rather than what it is "for".
If we're concerned about length, maybe "first-parent-not" could
get the meaning across:
- for "rev-list --first-parent A --not B" only first parents are visited
along A's ancestry
- for "rev-list --first-parent-not A --not B" it might be reasonable
that since B is a "not" commit, only first parents are visited along
B's ancestry.

Overall I don't think we can make a name so clear that the user
can avoid the man page anyway.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux