Re: [PATCH] diffcore-rename: don't consider unmerged path as source

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 17 Mar 2011, Junio C Hamano wrote:

> Martin von Zweigbergk <martin.von.zweigbergk@xxxxxxxxx> writes:
> 
> > The output from 'git status' and 'git diff-index --cached -M' is
> > broken when there are unmerged files as well as new files similar to
> > the unmerged file (in stage 1).
> >
> > When two copies have been created the output is:
> >
> > $ git status -s
> > C  original -> exact-copy
> > R  original -> modified-copy
> >  U original
> 
> The first column of
> status output is about comparing the HEAD and the index, and the second
> column is about comparing the index and the working tree

According to git-status's man page, that is true for paths without
merge conflicts only. For paths with merge conflicts, the two letters
represent each side of the merge. That is why I thought " U" (note the
SP) was invalid. Reading the documentation again, I see that ' '
actually means 'unmodified'. I'm still not sure it is correct
though. First, "original" was modified on both sides in this case,
which is of course the reason it had a merge conflict. Second, the
meaning of " U" is not described in the table in the man page. I have
always assumed that the table is supposed to be complete.

Either way, the fact that "original" is completetly missing from the
verbose 'git status' output must surely be incorrect.

> When comparing an unmerged index with the working tree, the comparison
> itself does not make much sense.  When producing textual diff,
> "diff-index" tends to give the difference between stage #2 and the working
> tree in order to show the difference from "our" version and the result of
> the incomplete merge, but when we need to show the result concisely in the
> "status -s" output, the fact that the index is unmerged is more important
> than the incomplete merge result in the working tree is different from the
> original, so we show "U".
> 
> So I think "U" is perfectly good there.

For completeness, I should say that I think the output from 'git
diff-files' is correct (and unchanged by this patch); only 'git
status' and 'git diff-index' showed incorrect (as far as I can see)
output.

$ git diff-files --name-status
U	original
M	original

Same output with '-M' or '-C'.

> About the comparison between HEAD and index, "original" in HEAD is copied
> to "exact-copy" in the index, and "modified-copy" in the index has a very
> similar contents as "original" in HEAD. It may be a bug that the latter is
> shown as "R" and I suspect that is because the code mistook the unmerged
> entry in the index as missing.  Turning that "R" to "C" may be worth
> doing. Change the code that says "ah, the index is unmerged at this path,
> so treat it as if it is not there" to "if the unmerged path does not have
> stage #2 entry, it is missing".

Exactly. That is what this patch tries to do (although not for 'git
status', but for 'git diff-index --cached -C', see later answers).

Speaking of stage #2, I think my references to stage #1 in the commit
message should have been referring to stage #2.

> > There are several errors here: 1) "original" has invalid status " U",
> > 2) "modified-copy" is listed as a rename, even though its source
> > ("original") still exists, and 3) "exact-copy" is detected as a copy,
> > even though 'git status' is only supposed to show renames
> 
> The prose is good but if you illustrated a bug with a command output,
> please follow it up with another command output that you think is the
> right output.

Makes sense. This is the output after this patch:

$ git status -s
A  exact-copy
A  modified-copy
UU original

> I don't think anybody said "only supposed to show renames", but I suspect
> that the recent lt/diff-rename topic may affect this part of the output.

wt-status.c sets rev.diffopt.detect_rename = 1 (==
DIFF_DETECT_RENAME). I think that means that it should only detect
rename, not copies. With the patch applies, the files show up as added
(as show in above). I tried changing it to DIFF_DETECT_COPY and then,
not surprisingly, 'git status' showed them as copies instead (even
with the patch applied).

While on the subject, I almost asked why 'git status' does not detect
copies, but I never did. I kind of suspected that it might be too
slow. Could there be some other reason?

> > Fix these problems by making diffcore-rename not consider unmerged
> > paths as source for rename detection. For simplicty, don't consider
> > the path independent of the type of merge conflict, even if the path
> > is deleted on at least one side. Still consider unmerged paths as
> > source for copy detection.
> 
> I don't think the part after "Still..." is justified enough.
> 
> For that matter, everything after "For simplicity..." is not justified
> enough. What are you sacrificing in return for that simplicity?  "Ideally
> we should show X but because we don't consider these we end up getting Y
> but that still is better than what we get today which is Z" is missing.

I wasn't really sure how it should work, so I just punted and said
"For simplicity" :-). What I was not sure about was what stage to
compare to. For example, for "deleted by us" conflicts, the most
likely outcome is that the file will be deleted, so it could make some
sense to make it eligible for rename detection. This would mean that
if a similar file was added by "them", it would be described as a
rename in the git-status output. The same applies to "deleted by us",
but if I understand correctly, we don't have the stage #3 content
available to compare to at this point in the code, and even if we did,
it would be inconsistent with how the diff is done with stage #2
everywhere else.

I don't have time to think more about this right now, but I hope the
above explains what I simplfied from. I'll think a bit more about this
later and might get back with another mail.

> > Also not sure about the "while at it" stuff...
> 
> Because "while at it" is by definition stuff that is not essential, don't
> do "white at it" if you are not sure, if it adds unnecessary noise and
> burden on reviewers.

I changed it because I thought it is slightly more clear that
DIFF_DETECT_RENAME, as opposed to 1, does not detect renames. I could
either explain that reason in the commit message, do it in a separate
patch, or just drop it completely. I don't have a strong
opinion. Which would you prefer?

Thanks for reviewing. I will try to include much of the above in the
commit message in a re-roll.


/Martin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]