Re: [RFC] origin link for cherry-pick and revert

"Stephen R. van den Berg" <srb@xxxxxxx> · Thu, 11 Sep 2008 17:32:02 +0200

Theodore Tso wrote:
>On Thu, Sep 11, 2008 at 02:31:48PM +0200, Stephen R. van den Berg wrote:
>> Well, the train of thought here goes as follows:
>> 2. Add support to cherry-pick/revert to actually generate the field upon
>>    demand.

>"git cherry-pick -x" already generates the field you want.

Well, sort of.  In order for swift parsing it should be a real field,
i.e. it should not be an English sentence (in order to avoid people
accidentally translating it); and it should list a pair of hashes
(patches/changesets are defined by the difference between two tree
snapshots).  So it would be a -o option most likely, in order to provide
backward compatibility to the users of -x.

>> 3. Then add support to prune/gc/fsck/blame/log --graph to take the field
>>    into account.

>Um, why should "git fsck", or "git prune" or "git gc" need to
>understand about this field?  What were you saying about unclean
>semantics, again?  I thought you claimed that dangling origin links
>were OK?  So why the heck should git fsck care?  And why shouldn't
>gc/prune drop objects that are only referenced via the origin link.

Dangling origin links are ok only if the developer in charge of the
repository doesn't care about the commits/branches they point to.
The definition of a "caring developer" is formalised by the fact that
the offending commits are already present in the repository or not.

This implies that fsck will skip the field if the hashes in question are
unreachable in the current repository.
If they are reachable though, fsck will follow the link and check the
whole tree referenced by the origin link.  Obviously there are only two
conditions for an origin link: either the hash points to an unreachable
object or the hash points to a reachable object of type commit (and all
associated checks that go with any commit).

gc will preserve the commits the origin links point to once they are
reachable.  I.e. if the developer doesn't care about the commits the
origin links point to (i.e. if the branches are not reachable) then gc
just skips them, if the developer *does* care, the origin links are used
to keep those objects alive (and, of course, all their parenthood).

>> 4. Add support to filter-branch/rebase to renumber the field if necessary.

>As we discussed earlier in some cases renumbering the field is not the
>right thing to do, especially if the commit in question has already
>been cherry-picked --- and you don't know that.  Again, this is why
>prototyping it outside of the core git is so useful; it will show up
>some of these fundamental flaws in the origin link proposal.

I agree that the behaviour of especially rebase with respect to the
origin links is still something that needs to be thought through.
I'm not convinced you are right, but I'm not convinced you are wrong
either.

>> Well, and after having done steps 1 to 5, the net result is that it
>> works almost as if the field is present in the header, except that:
>> - It is now at the end of the body in the commit message.
>> - It takes more time to find and parse it.

>A proof of concept, even if it isn't fully performant, is useful to
>prove that an idea actually has merit --- which clearly not everyone
>believes at this point.

Quite.

>I'll also note that having a ***local*** database to cache the origin
>link is a great way of short-circuiting the performance difficulties.
>If it works, then it will be a lot easier to convince people that
>perhaps it should be done git-core, and by modifying core git functions.

Creating local databases for these kinds of structures feels kludgy
somehow, since the git hash objects essentially *are* a working
database.  I have not checked yet if git already has some kind of
ready-to-use local database lib inside which I could reuse for that.

>Alternatively, if you think this is such a great idea, why don't you
>grab a copy of the git repository, and start hacking the idea
>yourself?

Actually, in the first hour after posting the initial mail/proposal I
already had altered a local version of git to support the origin links
in commit.[ch], --topo-order and fsck.  Before hacking further I decided
to get some feedback first to see if someone would come up with
something better.  And they did, instead of the mainline number, I
decided that using two hashes is better.  Once the dust has settled,
I'll fill in the rest of the code.

>  If you have running code, it tends to make the idea much
>more concrete, and much easier to evaluate.

Agreed, but then again, most of the programming is done without touching
any code (the design phase), which is where we are now.  Once the design
is scrutinised (as far as possible), the coding can begin (continue).
The feedback so far was very helpful, and caused me to explore (and
dismiss) some of the alternate avenues to achieve the desired
functionality.

>  Or were you hoping to
>convince other people to do all of this programming for you?

I've never needed that so far, and will not need that here either.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html