Re: Re* --creation-factor=100 does not show code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

On Thu, 28 Jul 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>
> > On Tue, 26 Jul 2022, Eugen Konkov wrote:
> >
> >> $ git range-diff --creation-factor=100 branch...origin/branch
> >> 1:  a87daecd47 < -:  ---------- Add mocked exchanges for ...::AutoRenew::General test
> >> -:  ---------- > 1:  36eaeb56a9 Add mocked exchanges for ...::AutoRenew::General test
> >> 2:  9594ccf145 = 2:  70681dd13b Remove a call to DB::state
> >> 3:  740903e01c = 3:  5745ae5702 Run cpanm without tests
> >> 4:  e8e6cac09c < -:  ---------- Do not use 'require'
> >>
> >> --creation-factor=101 does =)
> >>
> >> but maximum value for percentage is 100. So expected behaviour is to display range-diff when value 100 was provided
> >
> > Please see https://git-scm.com/docs/git-range-diff#_algorithm for an
> > explanation what the meaning of the factor is, and why 100 is not the
> > maximal sensible value.
>
> When I had to give a huge value to the option the last time, I think
> I used --creation-factor=999 or something.  The thing that bugged me
> in the output of "git range-diff --help" is that SYNOPSIS section
> has "--creation-factor=<factor>" but the OPTIONS heading says
> "--creation-factor=<percent>" and the word is used in description as
> well.
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
> Subject: range-diff: clarify --creation-factor=<factor>
>
> The value is not a per-cent that ranges from 0 to 100.

This sentence is a bit misleading because the factor _is_ measured in
terms of percent (not "per-cent", nor "per cent", see
https://writingexplained.org/per-cent-or-percent-difference).

The fact that we multiply by the number and divide by 100 makes it a
percentage, see e.g.
https://github.com/git/git/blob/v2.37.1/range-diff.c#L331:

		c = a_util->matching < 0 ?
			a_util->diffsize * creation_factor / 100 : COST_MAX;

The reason why I chose to write `<percent>` in the original patch
(https://github.com/git/git/commit/ba931edd284f) is to avoid confusion:
0.6 would be a factor, equivalent to 60%. But I had carefully decided not
to extend the `parse_options()` machinery to allow for floating point
numbers, hence I chose to allow specifying the factor in terms of a
percentage, which could be specified as an integer value. I _just_ noticed
a minute ago, though, that the synopsis makes the mistake of describing
the value as a `<factor>`, that's on me.

As to the original claim that percentages only go from 0-100, that is
easily refuted. If you wanted to pay $12 for something but ended up having
to pay $30, you'll end up having paid 150% more than planned. There you
are. A percentage that is greater than 100.

In this context, the explanation in
https://git-scm.com/docs/git-range-diff#_algorithm reveals how this
percentage is used: to determine the cost of _not_ pairing a patch on one
side of the range-diff with a patch on the other side, in terms that are
relative to the line count of the diff of said patch. The cost of pairing
two patches is the line count of the _diff between their diffs_.

This also explains why the default value of the creation factor is not
100% (as the original poster might have come to expect) but instead 60%, a
value that was at first picked out of thin air, but that turned out to be
reasonable: https://github.com/trast/tbdiff/commit/92ed41c84a89c.

Side note: One of my math teachers insisted on _not_ calling percent a
unit, as it is by definition unit-less: a percentage leaves the unit of
the value it modifies unchanged. Which is absolutely true. So I think we
should heed that advice here and avoid calling `%` a unit.

Ciao,
Dscho

> The SYNOPSIS section gets it right, but the body of the documentation
> said "percent" which confused readers.
>
> While we are at it, rephrase "smaller one" that corresponds to
> "larger value" earlier in the sentence to "smaller value" to be more
> explicit, to avoid misleading eyes of the readers to an unrelated "a
> large change" nearby.
>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  Documentation/git-range-diff.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git c/Documentation/git-range-diff.txt w/Documentation/git-range-diff.txt
> index fe350d7f40..e49630e8ad 100644
> --- c/Documentation/git-range-diff.txt
> +++ w/Documentation/git-range-diff.txt
> @@ -61,11 +61,11 @@ This is known to `range-diff` as "dual coloring". Use `--no-dual-color`
>  to revert to color all lines according to the outer diff markers
>  (and completely ignore the inner diff when it comes to color).
>
> ---creation-factor=<percent>::
> -	Set the creation/deletion cost fudge factor to `<percent>`.
> +--creation-factor=<factor>::
> +	Set the creation/deletion cost fudge factor to `<factor>`.
>  	Defaults to 60. Try a larger value if `git range-diff` erroneously
>  	considers a large change a total rewrite (deletion of one commit
> -	and addition of another), and a smaller one in the reverse case.
> +	and addition of another), and a smaller value in the reverse case.
>  	See the ``Algorithm`` section below for an explanation why this is
>  	needed.
>
>
>
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux