Re: [PATCH] difftool: add support for an extended revision syntax

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 Mar 2009 10:36:15 -0700

David Aguilar <davvid@xxxxxxxxx> writes:

> I'm still interested in the file~<n> idea [though maybe not
> that exact syntax] and have been reading revision.c (as Junio
> suggested) to see if it can be done in a good way.

Modern git programs use setup_revisions() to parse object names and range
notation given on the command line, and leave results in revs->pending
array.  This array holds pointers to "struct object" instances, and they
are often (always?) already parsed.  This gives the caller an easy access
to the parse results.

Recently there have been two topics that made me suspect that solving them
cleanly may require breaking this model.  I haven't thought things through
for the second one yet, so please do not take this as a criticism or
pointing-out-concrete-problems in the latter series.  At least for the
latter one, it is still an unsubstantiated fear.

(1) "The commit that touched this path the last time"

The objective of this extension is to accept a pathspec and come up with
the name of the commit object that touched the given pathspec recently.
There can be some variations to the parameter the feature could take, and
in the most generic form, the query would be:

 * set of commits: start digging from these;
 * pathspecs: find commits whose change touches some path that matches them;
 * N: do not stop at the latest change, but find the Nth one.

I am not good at coming up with a notation, so I'll leave the actual
syntax to other people, but you can say things like:

	git diff <<--all, Documentation t, 4>>
	git log <<master, Makefile, 10>>..next -- Makefile

The problem with this feature is this.  In order to come up with the
answer, you need to internally run:

	git rev-list --all -- Documentation t
	git rev-list master -- Makefile

and in order to compute this, the revision traversal machinery has to
not only open and parse commit objects but _rewrite_ the parent field of
these commits for history simplification purposes.  This is a destructive
operation, and after it comes back, we would need to clean them up by
unparsing these commits so that the original command that asked for its
command line arguments to be parsed can operate correctly.

At least, this can be fixed by unparsing all the objects parsed so far
when the revision command line parser finishes each such argument, and
force the caller parse them again when they read from rev->pending.

(2) "refs/replace/SHA-1 holds name of the object that replaces SHA-1"

The objective of this extension is to generalize the graft mechanism, so
that they can be shared across the usual object transfer mechanism, the
ordinary revision traversal mechanism can honor the replacement just like
they honor grafts file, while reachability machinery used by object
transfer, fsck and prune can ignore the replacement.  When refs/replace/A
has value B, a commit C that records A as its parent would behave as if B
is its parent (so "git log C" will show B as the second entry in its
output), while pushing C will notice C's true parent is A and sends A to
the other end.  By pushing refs/replace/A to the other end, you can share
the same "fake" history across repositories.

This feature has similar problem as (1).  If you say

	git rev-list master..next^^

how should next^^ be interpreted?  Because it is an input coming from the
user, we may want to honor the "graft/replace", and we need to read and
parse next, next^, and next^^ while possibly reading from refs/replace/
hiearchy.  But after getting the end result, the name of the commit object
the notation refers to, we would need to unparse the objects involved so
that the main parts of the command can start from the clean slate.

There are three issues with this:

 (1) When you ask for the data for a concrete SHA-1 (i.e. not "next^", but
     something like "9856dd811ee0f256d067b89cbccb58d944aa9c8c"), with and
     without grafts/replacements, the actual object data changes.

     This again needs at least unparsing of all the objects as in the
     <<commits, pathspecs, Nth>> case above to deal with.

 (2) Depending on the use of grafts/replacements, the interpretation of
     "next^^" changes; should we follow the true parenthood, or the
     replaced one?

     In order to give Porcelains the same freedom to honor or ignore the
     replacements, we would eventually need to expose --ignore-replace
     option to "git rev-list":

	git rev-list --ignore-replace master..next^^

     It gets tricky when --ignore-replace should take effect.  Before or
     after master..next^^ is interpreted?

 (3) Depending on the use of replacements, even the type of the object an
     extended SHA-1 expression refers to can change.  tag-foo may point at
     an object 9856dd811ee0f256d067b89cbccb58d944aa9c8c, and without
     replacements it may be a commit but the replacement mechanism is
     allowed to say it is another tag that points at a commit.

Not that the "replacement" feature is bad, it is just that the feature is
too flexible for the current codebase to handle sanely.

In order to solve these issues safely (not necessarily efficiently nor
cleanly), I think we might need to change the setup_revisions() to:

 (1) operate the way it currently does, leaving the parsed objects in
     revs->pending;

 (2) remember the objects' SHA-1 and flags in the revs->pending array;

 (3) discard all parsed objects, as revision traversals may have rewritten
     commits for history simplification purposes; and

 (4) rebuild revs->pending list by parsing the objects again, using the
     SHA-1 values we remembered in step (2)

or something like that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html