Re: [PATCH 2/2] diffcore-pickaxe doc: document -S and -G properly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ramkumar Ramachandra <artagnon@xxxxxxxxx> writes:

> The documentation of -S and -G is very sketchy.  Completely rewrite the
> sections in Documentation/diff-options.txt and
> Documentation/gitdiffcore.txt.
>
> References:
> 52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
> f506b8e (git log/diff: add -G<regexp> that greps in the patch text)
>
> Inputs-from: Phil Hord <phil.hord@xxxxxxxxx>
> Co-authored-by: Junio C Hamano <gitster@xxxxxxxxx>
> Signed-off-by: Ramkumar Ramachandra <artagnon@xxxxxxxxx>
> ---
>  Documentation/diff-options.txt | 38 +++++++++++++++++++++++++++++--------
>  Documentation/gitdiffcore.txt  | 43 ++++++++++++++++++++++++------------------
>  2 files changed, 55 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 104579d..2835eef 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -383,14 +383,36 @@ ifndef::git-format-patch[]
>  	that matches other criteria, nothing is selected.
>  
>  -S<string>::
> -	Look for differences that introduce or remove an instance of
> -	<string>. Note that this is different than the string simply
> -	appearing in diff output; see the 'pickaxe' entry in
> -	linkgit:gitdiffcore[7] for more details.
> +	Look for differences that change the number of occurrences of
> +	the specified string (i.e. addition/deletion) in a file.
> +	Intended for the scripter's use.
> ++
> +It is especially useful when you're looking for an exact block of code
> +(like a struct), and want to know the history of that block since it
> +first came into being: use the feature iteratively to feed the
> +interesting block in the preimage back into `-S`, and keep going until
> +you get the very first version of the block.

OK, even though I would not say "especially" nor "useful" if I were
writing it, as it is the only use case it was designed for.

>  -G<regex>::
> +	Look for differences whose patch text contains added/removed
> +	lines that match <regex>.
> ++
> +To illustrate the difference between `-S<regex> --pickaxe-regex` and
> +`-G<regex>`, consider a commit with the following diff in the same
> +file:
> ++
> +----
> ++    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
> +...
> +-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
> +----
> ++
> +While `git log -G"regexec\(regexp"` will show this commit, `git log
> +-S"regexec\(regexp" --pickaxe-regex` will not (because the number of
> +occurrences of that string did not change).
> ++
> +See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
> +information.

OK.

>  --pickaxe-regex::
> -	Make the <string> not a plain string but an extended POSIX
> -	regex to match.
> +	Treat the <string> given to `-S` as an extended POSIX regular
> +	expression to match.

OK.

> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index 568d757..ef4c04a 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -222,26 +222,33 @@ version prefixed with '+'.
>  diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
>  ---------------------------------------------------------------------
>  
> -This transformation is used to find filepairs that represent
> -changes that touch a specified string, and is controlled by the
> --S option and the `--pickaxe-all` option to the 'git diff-*'
> -commands.
> -
> -When diffcore-pickaxe is in use, it checks if there are
> -filepairs whose "result" side and whose "origin" side have
> -different number of specified string.  Such a filepair represents
> -"the string appeared in this changeset".  It also checks for the
> -opposite case that loses the specified string.
> -
> -When `--pickaxe-all` is not in effect, diffcore-pickaxe leaves
> -only such filepairs that touch the specified string in its
> -output.  When `--pickaxe-all` is used, diffcore-pickaxe leaves all
> -filepairs intact if there is such a filepair, or makes the
> -output empty otherwise.  The latter behaviour is designed to
> -make reviewing of the changes in the context of the whole


> +There are two kinds of pickaxe: the S kind (corresponding to 'git log
> +-S') and the G kind (mnemonic: grep; corresponding to 'git log -G').

This is good as the beginning of a second paragraph or the second
sentence of the first paragraph.  This patch loses the description
of the general purpose of this machinery that should come at the
very beginning of the section (the original had a very good ut valid
only back when we had only -S; my "how about this" text did not have
a good one).

For example, the "rename" is about taking one set of filepairs and
expressing (some of) them as renames or copies by merging a deletion
filepair and a creation filepair into a rename-modify filepair, or
turning a creation filepair into a copy-modify filepair by finding a
preimage.  What does this transformation do?

Again here is my attempt for that missing first paragraph:

	This transformation limits the set of filepairs to those
	that change specified strings between the preimage and the
	postimage in a certain way.

        -S<block of text> and -G<regex> options are used to specify
	different ways these strings are sought.  Without
	--pickaxe-all, only the filepairs matching the given
	criterion is left in the output; all filepairs are left in
	the output when --pickaxe-all is used and if at least one
	filepair matches the given criterion.

but I do not have enough time now to condense the above down to a
readable paragraph of reasonable length (I expect that the ideal
final form would be like 5-6 lines at most).

> +"-S<block of text>" detects filepairs whose preimage and postimage
> +have different number of occurrences of the specified block of text.
> +By definition, it will not detect in-file moves.  Also, when a
> +changeset moves a file wholesale without affecting the interesting
> +string, rename detection kicks in as usual, and `-S` omits the
> +filepair (since the number of occurrences of that string didn't change
> +in that rename-detected filepair).

I am not sure why it is necessary to say anything about what the
previous step (diffcore-rename) might have done.  The input of this
(or any other) step in the diffcore pipeline is a preimage-postimage
filepairs, and to this transformation the filename does not matter.
Whether a file was moved (either "wholesale", implying nothing
changed, or renamed with modification at the same time) without
touching the block of text, or a file did not get involved in any
renaming, the only thing that matters is what the preimage and the
postimage in a filepair has (or does not have).

> + The implementation essentially
> +runs a count, and is significantly cheaper than the G kind.  When used
> +with `--pickaxe-regex`, treat the <block of text> as an extended POSIX
> +regular expression to match, instead of a literal string.

Sure.  Is "essentially runs a count" needed, though?  The reader has
just read "number of occurrences of the specified block of text" so
it would be obvious that the implementation counts.  It may be true
that it is significantly cheaper, but because they serve different
purposes, I am not sure it is worth saying.  It is like saying that
a hammer is significantly faster to drive a nail into wood than a
screwdriver to drive a screw into wood, without saying "nail" and
"screw".  It only invites readers to use a hammer to drive a screw.

> +"-G<regular expression>" detects filepairs whose textual diff has an
> +added or a deleted line that matches the given regular expression.
> +This means that it can detect in-file (or what rename-detection
> +considers the same file) moves.

"it can" sounds as if it is always a merit, which is probably not
what you wanted to imply.

When you are trying to see how a particular line came into the
shape, you would want to know what the previous shape of it was, but
a literal move will also be shown, which is a noise for the purpose
of digging.

> +The implementation runs diff twice
> +and greps, and this can be quite expensive.

Unlike the "count" one above which was obvious, the "runs diff and
greps hence expensive" part is worth saying.

> +When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
> +that match their respective criterion are kept in the output.  When
> +`--pickaxe-all` is used, if even one filepair matches their respective
> +criterion in a changeset, the entire changeset is kept.  This behavior
> +is designed to make reviewing changes in the context of the whole
>  changeset easier.

OK.

>  
> -
>  diffcore-order: For Sorting the Output Based on Filenames
>  ---------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]