Re: git pull suggestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Aghiles <aghilesk@xxxxxxxxx> writes:

> Although for people used to CVS/CVN the "stash" is yet another thing
> to learn. There is also a high probability for new users to see this message
> very early when using git and the question is always the same: why can't git
> just merge with my files and show me the conflict?

There actually are two answers to this question.

One is that it is not necessarily "can't", but rather "chooses not to".
If you are limiting yourself to CVS/SVN style of development, then we
certainly could do that.

Let's review what happens in CVS/SVN world when you "update" with dirty
working tree.  As you are familiar with CVS/SVN, you should be able to
follow this part quite easily.

You "updated" and were in sync with the central repository at some point,
let's call it O, and have some uncommitted changes in the work tree since
then.  In the meantime, other people worked on the project and made more
commits in the central repository.  The tip of the central repository is
at commit A.

         x
        / 
    ---O---X---X---X---A

You would want your "cvs update" to end up with a topology like this:

                         x'
                        / 
    ---O---X---X---X---A

where

        x' = merge-three-way(O, x, A)

That is, the files in the work tree are updated to contain whatever
was done by other people between O and A.

git does _not_ implement a handy Porcelain to do this, but we could script
it like this (I am only illustrating that it can be done, but I am leaving
the reason why git chooses not to to a later part of this message).

	#!/bin/sh
        # Usage: git-cvs-pull remote [refspec]

        # Fetch from the other
        git fetch "$@"
	# Figure out "A", i.e. the updated commit
	merge_head=$(sed -e '/	not-for-merge	/d' \
		-e 's/	.*//' .git/FETCH_HEAD | \
		tr '\012' ' ')
	# cvs/svn style pull will never have an octopus
        case "$merge_head" in
        ?*' '?*)	die "cannot merge more than one" ;;
        ?*)		;;
        *)		die "nothing to merge" ;;
        esac

	# Make sure it is cvs/svn-style pull.  That is, our commit must
        # be an ancestor of the updated commit
	test 0 = (git rev-list $merge_head..HEAD | wc -l) || die "you forked"

        # At this point, we know the topology is like this.
        #
        #         x
        #        / 
        #    ---O---X---X---X---A

	# Figure out the current branch
	branch=$(git symbolic-ref HEAD)

	# Checkout and detach to "A" while carrying the local changes.
        # This may leave conflicts but that is what the user is asking for.
        git checkout -m "$merge_head^0"

        # At this point, topology has become:
        #
        #                         x'
        #                        / 
        #    ---O---X---X---X---A
        #
        # We have detached HEAD at A but haven't updated the branch yet.

        case "$branch" in
        '')	;; # detached from the beginning
        ?*)	git update-ref -m "cvs-pull" "$branch" "$merge_head" ;;
	esac

Note that this was written in my MUA and is obviously untested ;-) but I
think you have also been around here long enough to understand the idea.

So that was one of the answers.  It's not "we can't do it", but is "in a
world with cvs/svn limitation, we could".

The other answer would initially appear a bit sad, but after you think
about it, it would turn into an enlightenment, especially for people whose
brains have rotten from years and years of CVS/SVN use.

If you are not limited to CVS/SVN style of development and have made
commits since you updated from the central repository the last time,
CVS/SVN style "update" is fundamentally impossible.

Again, you "updated" and were in sync with the central repository at some
point, let's call it O, and this time, made a few commits, ending with
commit B.  You further have some uncommitted changes in the work tree
since then.  In the meantime, other people worked on the project and made
more commits in the central repository.  Again, the tip of the central
repository is at commit A.

                   x
                  /
         Y---Y---B
        / 
    ---O---X---X---X---A

First, a simple question.

What kind of topology would you want to end up with?

Think.

	... you think for five minutes ...

	(page break)

	... and then you look at the answer ...

Yes, you want to have a merge between A and B, and then have your local
change relative to M in your working tree.  In other words, the topology
should look like this:

                           x'
                          /
         Y---Y---B-------M
        /               /  
    ---O---X---X---X---A

where

	M = merge-three-way(O, B, A)
        x' = merge-three-way(B, x, M)

Again, think.  How would you deal with conflicts while coming up with M?

You cannot leave files with conflict markers in the work tree and have the
user fix them up to record M.  Quite contrary to what you insinuated, your
working directory is not a second class citizen but is a very valuable
entity, and it already has important changes between B and x.  We cannot
afford to overwrite it with a half-merge result between A and B for the
purpose of conflict resolution between A and B.

Worse yet, even if we _could_ keep the changes between B and x in the same
file while showing conflicts between A and B (perhaps the changes you made
between B and x did not overlap the region conflicted between A and B), we
cannot still write such a thing out to the working tree.  Why?  Because
then you have to sift through the changes in that file and commit _only_
the parts that are relevant to the merge between A and B while finishing
the merge to produce M, while leaving the change between B and x (which is
going to become the difference between M and x' and left in the working
tree) alone.  And that is actually the best case.  What would you do if
the conflicted region between A and B were something that you changed in
the working tree between B and x?

So the "enlightenment" part is that once you have an ability to "fork" the
history, CVS/SVN style "edit, update, commit" cycle _fundamentally_ would
not work.  That is why "commit first and then merge" is the norm in DVCS
world.

Now how would one deal with this then?  The answer is actually quite
simple.  Let's go back to the first picture:

                   x
                  /
         Y---Y---B
        / 
    ---O---X---X---X---A

We want to come up with a merge between A and B first to produce M, and
while we do that, we do not want to lose the valuable change between B and
x, so we _stash it away_.  Then we can use the working tree to deal with
potential conflicts while finishing the merge to produce M.  In other
words, after stashing, we can safely run "git pull" to produce M.

                   (x) --- the change is stashed away
                  /
         Y---Y---B-------M
        /               /  
    ---O---X---X---X---A

And then we can replay the stash on top of M to produce x'

                           x'
                          /
         Y---Y---B-------M
        /               /  
    ---O---X---X---X---A


And the final answer (yes, I said there are two answers to the original
question, and I already gave two answers, but I let the above description
to raise another question "why does git choose not to implement the logic
of the first answer in a fast-forward case, aka cvs/svn style?") is that
it simply is not worth it to special case the "I didn't commit and ran
pull again".  The workflow to result in such a case would look like this:

	$ git pull
        ... you are in sync with the other end ...
        $ edit
        $ edit
        $ edit
        $ edit
        $ edit
        $ edit
        ... keep working forever _without ever committing_ ...
        $ git pull

which goes against the distributed nature of the system you are using.

Worse yet, once you have committed between these pulls, even once, then
the simple-minded "cvs/svn update" style will not fundamentally work.
Rather than training the users with "If you didn't commit, then you can do
"pull" many many times, but once you commit, then you have to do something
different", which is not very useful anyway, it is better to teach the
more general "forked" case, because the general case solution will also
work in the fast-forward case.

Now, the above inevitably solicits "then why doesn't 'pull' automatically
stash and then unstash?" question.  I think the answer is obvious if you
think about it, and it is getting late, so I'll leave that as an exercise
to the readers but will leave a pictorial hint.

                   C-------M
                  /       /
         Y---Y---B       / 
        /               /
    ---O---X---X---X---A

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]