Re: git blame --ignore-rev does not work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi -

On 10/2/20 6:52 PM, Harrison McCullough wrote:
Thank you for your feedback! I do have some more information to
provide that is confusing.

I tried running `git blame -w`, and this correctly ignores the
revision I tried to ignore with `--ignore-rev`, etc. So it appears
that the algorithm to attribute lines to commits is capable of
ignoring the commit in question (in the lines I've inspected) but it's
not doing it when I use the "ignore-rev" capability—only the "ignore
whitespace changes" capability.

Does anyone have any ideas about why that may be the case? Does the
"ignore whitespace" and "ignore commit" algorithms use different
logic? I would have assumed that they shared most of the logic.

Yeah, the logic is a little different. IIRC, the "whitespace ignore" affects individual diff_hunks that are generated during the blame process, generated by calling into xdiff. That's when blame tries and figure out what lines change from target/child to parent. So the "blame chunk" that gets analyzed is different, depending on whitespace-ignore or not.

The "blame ignore" logic happens after that, and it operates on the output of xdiff. If the 'target' is one of the ignored commits, it looks at those chunks produced by xdiff (differences from target commit to parent) and for each line in the target commit, attempts to figure out what line in the parent to match it to.

The matching process is imperfect. It uses a fingerprinting algorithm to find a likely candidate line in the parent's diff hunk. The fingerprinting is based on matching the two-character pairs in each string. (I didn't write the fingerprinting, btw). It does some ranking of the lines, picks the best one, and a few other 'search quality' things. etc. If it fails to find a matching line in the hunk, it'll do a simple O(n) scan in the entire file. But when it looks in the entire file, it has a threshold of similarity, so you don't match arbitrary things. (Threshold is 10 two-letter pairs, I think).

Anyway, my guess is that when you use the ignore-whitespace option, it changes the diff hunks enough that blame_chunk() gives a different result. This could be because there are larger diff hunks (more search space for the "good" initial scan). Or maybe because you have a different set of two-character pairs or something.

I would love to provide a concrete example, but the only time I've
been able to reproduce this is with proprietary code. I'll try to
create a new repository with a similar commit and see if I can ignore
it there.

I'd be glad to take a look, though I don't know how fixable it is. My guess is you're hitting right on the edge of the "find a similar line inside a hunk" and "couldn't find a good change in the entire file" threshold, and any change would just be tuning it for your repo. But maybe not, and I'd like for --ignore-rev to work for you.

For the information of those interested, the commit I'm trying to
ignore is a "reformat the world" commit. We introduced the tool
"astyle" into our codebase, and as part of that effort I ran astyle
over our entire codebase.

Same situation for me - reformatted a lot of code and didn't want to break blame. =) That was the original motivation. I've actually used it a lot for other things now, such as finding out which commit changed a line in a particular way. git blame, look around. git show XXX, realize i want and older commit, git blame --ignore-rev XXX, etc.

Is it possible that the commit isn't being ignored because it's too
big? It did change over 1300 files....

That should be OK. The algorithm doesn't care about the other files in the commit - only the one that you are blaming.

Thanks,

Barret



On Fri, Oct 2, 2020 at 4:44 PM Barret Rhoden <brho@xxxxxxxxxx> wrote:

Hi -

On 10/2/20 5:40 PM, René Scharfe wrote:
[snip]
I don't know if these revisions are not ignored due to bugs or because
the feature just isn't strong enough, yet, but I would expect your
particular case to be represented by at least one of these...

Correct.

When skipping a revision, the algorithm attempts to find another
revision that could be responsible for the change.  But it might not be
able to find anything.  Consider a commit that just adds a few lines to
a file with only 'foo' and 'bar':

commit: "Adding Lines"
-------------
   foo
+No commit
+ever touched
+these lines
   bar

If we ignored that revision, which commit do we assign those lines to?
If they were "similar" to the existing lines, then the algorithm might
match.  But in general, we can't find 'correct' (as defined by a user)
matches for arbitrary changes.

I usually run git with these settings:

[blame]
          ignorerevsfile = .git-blame-ignore-revs
          markIgnoredLines = true
          markUnblamableLines = true

Which points out when --ignore-revs is doing something.

Thanks,

Barret








[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux