Regarding "git log" on "git series" metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After your talk at LPC2016, I was thinking about your proposal to
give an option to hide certain parents from "git log" traversal.

While I do not think we would terribly mind a new feature in the
core to support third-party additions like "git series" better, I
think this particular one is a big mistake that we shouldn't take.

For those listening from sidelines, here is a version of my
understanding of "git series":

 * "git series" wants to represent a patch series evolution.  It is
   a history of history, and each element of this evolution is
   represented by:

   - a commit object, that is used to describe what this reroll of
     the topic is about, and its parent links point at previous
     rerolls (it could be a merge of two independent incarnations of
     a series).

   - the tree contained in the commit object records the base commit
     where the topic forks from the main history, and the tip commit
     where the topic ends.  These are pointers into the main history
     DAG.

   - the tree may have other metadata, an example of which is the
     cover letter contents to be used when the topic becomes ready
     for re-submission.  There may be more metadata you would want
     to add in the future versions of "git series".

   Needless to say, the commits that represent the history of a
   series record a tree that is completely differently shaped.  The
   only relation between the series history and main history is that
   the former has pointers into the latter.

 * You chose to represent the base and tip commit object as gitlinks
   in the tree of a series commit, simply because it was a way that
   was already implemented to record a commit object name in a tree.

 * However, because gitlink is designed to be used for "external"
   things (the prominent example is submodule), recording these as
   gitlinks would guarantee that they will get GCed as a series
   progresses, the main history rewound and rewritten thereby making
   the base and tip recorded in the older part of the series history
   unreachable from the main history.  Because you want to make sure
   that base and tip objects will stay in the repository even after
   the topic branch in the main history gets rewound, this is not
   what you want.

 * In order to workaround that reachability issue, the hack you
   invented is to add the tip commit as a "parent" of a commit that
   represents one step in the series.  This may guarantee the
   reachability---as long as a commit in a series history is
   reachable from a ref, the tip and base commits will be reachable
   from there even if they are rebased away from the main history.
   But of course, there are downsides.

 * Due to this hack, feeding "gitk" (or "git log") a commit in the
   series history will give you nonsense results.  You are not
   interested in traversing or viewing the commits in the main
   history.

 * Because of the above, you propose another hack to tell the
   revision traversal machinery to optionally omit a parent commit
   that appear as a gitlink in the tree.

I think this is backwards.  The root cause of the issue you have
with "gitk" is because you added something that is *NOT* a parent to
your commit.  We shouldn't have to add a mechanism to filter
something that shouldn't have been added there in the first place.

I am wondering if an alternative approach would work better.

Imagine we invent a new tree entry type, "gitref", that is similar
to "gitlink" in that it can record a commit object name in a tree,
but unlike "gitlink" it does imply reachability.  And you do not add
phony parents to your commit object.  A tree that has "gitref"s in
it is about annotating the commits in the same repository (e.g. the
tree references two commits, "base" and "tip", to point into a slice
of the main history).  And it is perfectly sensible for such a
pointer to imply reachability---after all it serves different
purposes from "gitlink".

Another alternative that I am negative about (but is probably a
better hack than how you abused the "parent" link) might be to add a
new commit object header field that behaves similarly to "parent"
only in that it implies reachability.  But recording the extra
parent in commit object was not something you wanted to do in the
first place (i.e. your series processing is done solely on the
contents of the tree, and you do not read this extra parent). If you
need to add an in-tree reference to another commit in your future
versions of "git series", with either this variant or your original
implementation, you would end up needing adding more "parent" (or
pseudo parent) only to preserve reachability.  At that point, I
think it makes more sense to have entries in the tree to directly
ensure reachability, if you want these entries to always point at an
in-tree object.

I am afraid that I probably am two steps ahead of myself, because I
am reasonably sure that it is quite possible that I have overlooked
something trivially obvious that makes the "gitref" approach
unworkable.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]