Re: [PATCH] blame: add the ability to ignore commits

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 07 Jan 2019 15:13:02 -0800

Barret Rhoden <brho@xxxxxxxxxx> writes:

> Commits that make formatting changes or renames are often not
> interesting when blaming a file.  This commit, similar to
> git-hyper-blame, allows one to specify certain revisions to ignore during
> the blame process.
>
> To ignore a revision, put its committish in a file specified by
> --ignore-file=<file> or use -i <rev>, which can be repeated.

If I read it correctly, this gives a very limited form of -S, in the
sense that anything this can do can be expressed by using -S but the
reverse is not true, but is designed to be easier to use, in the
sense that unlike -S, this does not have to describe the part of the
history you do not have to lie about.  The documentation should say
something about these pros-and-cons to help readers decide which
feature to use.

I somehow feel that this is rare enough that it should not squat on
short-and-sweet '-i'.  We would want to reserve it to something like
"--ignore-case", for example.

> The file .git-blame-ignore-revs is checked by default.

Giving the projects a way to easily help participants to share the
same distorted view of the history is a good idea, but I do not
think we should allow projects to do so to those who merely clone it
without their consent.  IOW, "by default" is a terrible idea.

Perhaps paying attention to blame.skipRevsFile configuration
variable that is set by the end user would be sufficient----the
project can ship .blame-skip-revs (or any random filename of their
choice) in-tree as tracked contents and tell its users that they can
optionally use it.

> It's useful to be alerted to the presence of an ignored commit in the
> history of a line.  Those lines will be marked with '*' in the
> non-porcelain output.  The '*' is attached to the line number to keep
> from breaking tools that rely on the whitespace between columns.

A policy decision like the above two shouldn't be hardcoded in the
feature like this, but should be done as a separate option.  By
default, these shouldn't be marked with '*', as the same tools you
said you are afraid of breaking would be expecting a word with only
digits and no asterisk in the column where line numbers appear and
will get broken by this change if done unconditionally.

In general, I find this patch trying to change too many things at
the same time, without sufficient justification.  Perhaps do these
different things as separate steps in a single series?

> A blame_entry attributed to an ignored commit will get passed to its
> parent.

Obviously, an interesting consideration is what happens when a merge
commit is skipped.  Is it sufficient to change this description to
"...will get passed to its parentS", or would the code do completely
nonsensical things without further tweaks (a possible simple tweak
could be to refuse skipping merges)?

> If an ignored commit changed a line, an ancestor that changed
> the line will get blamed.  However, if an ignored commit added lines, a
> commit changing a nearby line may get blamed.  If no commit is found,
> the original commit for the file will get blamed.

The above somehow does not read as describing a carefully designed
behaviour; rather, it sounds as if it is saying "the code does
whatever random things it happens to do".  For example, when there
is a newly added line how is "A" commit changing a nearby line
chosen (a line has lines before it and after it---they may be
touched by different commits, and before and after that skipped
commit, so there are multiple commits to choose from)?

> diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
> index 16323eb80e31..e41375374892 100644
> --- a/Documentation/git-blame.txt
> +++ b/Documentation/git-blame.txt
> @@ -10,6 +10,7 @@ SYNOPSIS
>  [verse]
>  'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental]
>  	    [-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>]
> +	    [-i <rev>] [--no-default-ignores] [--ignore-file=<file>]
>  	    [--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>]
>  	    [--] <file>
>  
> @@ -84,6 +85,20 @@ include::blame-options.txt[]
>  	Ignore whitespace when comparing the parent's version and
>  	the child's to find where the lines came from.
>  
> +-i <rev>::
> +	Ignore revision when assigning blame.  Lines that were changed by an
> +	ignored commit will be marked with a `*` in the blame output.  Lines
> +	that were added by an ignored commit may be attributed commits making
> +	nearby changes or to the first commit touching the file.

It probably deserves to be told that this option can be given
multiple times and used cumulatively (unlike usual "last one wins"
rule).

> +--no-default-ignores::
> +	Do not automatically ignore revisions in the file
> +	`.git-blame-ignore-revs`.

This should not be "opt-out" like this.

> +--ignore-file=<file>::
> +	Ignore revisions listed in `file`, one revision per line.  Whitespace
> +	and comments beginning with `#` are ignored.

Should it be capable to take two or more ignore-files?  Or should we
use the usual "the last one wins" rule?

I think we should support blame.skipRevFile configuration variable
so that the users do not have to constantly give the option from the
command line.  And with that, there is no need to have a hardcoded
filename .git-blame-ignore-revs or anything like that.

If we are to use configuration variable, however, we'd need a way to
override its use from the command line, too.  Perhaps a sane
arrangement would be

    - if one or more --skip-revs-file=<file> are given from the
      command line, use all of them and ignore blame.skipRevsFile
      configuration variable.

    - if no --skip-revs-file=<file> is given from the command line,
      use blame.skipRevsFile configuration variable.

    - regardless of the above two, pay attention to --skip-rev=<rev>
      command line option(s).

Somehow the damage to blame.c codebase looks way too noisy for what
the code wants to do.  If all we want is to pretend in a history,
e.g.

    ---O---A---B---C---D

that commit B does not exist, i.e. use a distorted view of the
history

    ---O---A-------C---D

wouldn't it be sufficient to modify pass_blame(), i.e. the only the
caller of the pass_blame_to_parent(), where we find the parent
commits of "C" and dig the history to pass blame to "B", and have it
pretend as if "B" does not exist and pass blame directly to "A"?

Thanks.  I am personally not all that interested in this yet.