Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 6, 2023 at 9:24 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
> On Sun, Feb 05 2023, John Cai via GitGitGadget wrote:
>
> > From: John Cai <johncai86@xxxxxxxxx>
> > [...]
> > +
> > +             if (!o->xdl_opts_command_line) {
> > +                     static struct attr_check *check;
> > +                     const char *one_diff_algo;
> > +                     const char *two_diff_algo;
> > +
> > +                     check = attr_check_alloc();
> > +                     attr_check_append(check, git_attr("diff-algorithm"));
> > +
> > +                     git_check_attr(the_repository->index, NULL, one->path, check);
> > +                     one_diff_algo = check->items[0].value;
> > +                     git_check_attr(the_repository->index, NULL, two->path, check);
> > +                     two_diff_algo = check->items[0].value;
> > +
> > +                     if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) &&
> > +                             !strcmp(one_diff_algo, two_diff_algo))
> > +                             set_diff_algorithm(o, one_diff_algo);
> > +
> > +                     attr_check_free(check);
>
> This is a bit nitpicky, but I for one would find this much easier to
> read with some shorter variables, here just with "a" rather than
> "one_diff_algo", "b" instead of "two_diff_algo", and splitting
> "the_repository->index" into "istate" (untested):
>
>         +               if (!o->xdl_opts_command_line) {
>         +                       static struct attr_check *check;
>         +                       const char *a;
>         +                       const char *b;
>         +                       struct index_state *istate = the_repository->index;
>         +
>         +                       check = attr_check_alloc();
>         +                       attr_check_append(check, git_attr("diff-algorithm"));
>         +
>         +                       git_check_attr(istate, NULL, one->path, check);
>         +                       a = check->items[0].value;
>         +                       git_check_attr(istate, NULL, two->path, check);
>         +                       b = check->items[0].value;
>         +
>         +                       if (!ATTR_UNSET(a) && !ATTR_UNSET(b) && !strcmp(a, b))
>         +                               set_diff_algorithm(o, a);
>         +
>         +                       attr_check_free(check);
>         +               }
>
> That also nicely keeps the line length shorter.
>
> > @@ -333,6 +333,8 @@ struct diff_options {
> >       int prefix_length;
> >       const char *stat_sep;
> >       int xdl_opts;
> > +     /* If xdl_opts has been set via the command line. */
> > +     int xdl_opts_command_line;
> >
> >       /* see Documentation/diff-options.txt */
> >       char **anchors;
> > diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
> > index 8d1e408bb58..630c98ea65a 100644
> > --- a/t/lib-diff-alternative.sh
> > +++ b/t/lib-diff-alternative.sh
> > @@ -107,8 +107,27 @@ EOF
> >
> >       STRATEGY=$1
> >
> > +     test_expect_success "$STRATEGY diff from attributes" '
> > +             echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> > +             test_must_fail git diff --no-index file1 file2 > output &&
> > +             test_cmp expect output
> > +     '
> > +
> >       test_expect_success "$STRATEGY diff" '
> > -             test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
> > +             test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
>
> Nit: The usual style is ">output", not "> output".
>
> > +             test_cmp expect output
> > +     '
> > +
> > +     test_expect_success "$STRATEGY diff command line precedence before attributes" '
> > +             echo "file* diff-algorithm=meyers" >.gitattributes &&
> > +             test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> > +             test_cmp expect output
> > +     '
> > +
> > +     test_expect_success "$STRATEGY diff attributes precedence before config" '
> > +             git config diff.algorithm default &&
> > +             echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> > +             test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> >               test_cmp expect output
> >       '
> >
> > @@ -166,5 +185,11 @@ EOF
> >               test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output &&
> >               test_cmp expect output
> >       '
> > +
> > +     test_expect_success "$STRATEGY diff from attributes" '
> > +             echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> > +             test_must_fail git diff --no-index uniq1 uniq2 > output &&
> > +             test_cmp expect output
> > +     '
> >  }
>
> For some non-nitpicking, I do worry about exposing this as a DoS vector,
> e.g. here's a diff between two distant points in git.git with the
> various algorithms:
>
>         $ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0'
>         Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0
>           Time (abs ≡):        42.121 s               [User: 41.879 s, System: 0.144 s]
>
>         Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0
>           Time (abs ≡):        35.634 s               [User: 35.473 s, System: 0.160 s]
>
>         Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0
>           Time (abs ≡):        46.912 s               [User: 46.657 s, System: 0.228 s]
>
>         Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0
>           Time (abs ≡):        33.233 s               [User: 33.072 s, System: 0.160 s]
>
>         Summary
>           'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran
>             1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0'
>             1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0'
>             1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0'

I'm really surprised by these numbers; they aren't remotely close to
what I compute.  Am I correct in understanding you ran these in
git.git?  Was your computer overloaded?  Was your git.git in some
serious need of repacking?  Was it on a network filesystem?  If you
run with more than 1 run, are your numbers even repeatable?

Using git compiled from current main, I see:

$ hyperfine -L a patience,minimal,histogram,myers './git diff
--diff-algorithm={a} v2.0.0 v2.28.0'
Benchmark 1: ./git diff --diff-algorithm=patience v2.0.0 v2.28.0
  Time (mean ± σ):      1.142 s ±  0.033 s    [User: 1.022 s, System: 0.114 s]
  Range (min … max):    1.117 s …  1.212 s    10 runs

Benchmark 2: ./git diff --diff-algorithm=minimal v2.0.0 v2.28.0
  Time (mean ± σ):      1.959 s ±  0.011 s    [User: 1.830 s, System: 0.120 s]
  Range (min … max):    1.947 s …  1.976 s    10 runs

Benchmark 3: ./git diff --diff-algorithm=histogram v2.0.0 v2.28.0
  Time (mean ± σ):      1.187 s ±  0.007 s    [User: 1.065 s, System: 0.115 s]
  Range (min … max):    1.175 s …  1.200 s    10 runs

Benchmark 4: ./git diff --diff-algorithm=myers v2.0.0 v2.28.0
  Time (mean ± σ):      1.194 s ±  0.007 s    [User: 1.068 s, System: 0.120 s]
  Range (min … max):    1.184 s …  1.206 s    10 runs

Summary
  './git diff --diff-algorithm=patience v2.0.0 v2.28.0' ran
    1.04 ± 0.03 times faster than './git diff
--diff-algorithm=histogram v2.0.0 v2.28.0'
    1.05 ± 0.03 times faster than './git diff --diff-algorithm=myers
v2.0.0 v2.28.0'
    1.71 ± 0.05 times faster than './git diff --diff-algorithm=minimal
v2.0.0 v2.28.0'

And this is on a kind-of low-end refurbished laptop from a few years
ago (although the repo was recently gc'ed).

I'm biased towards histogram (and making it the default rather than
making it configurable per file), but that's probably obvious given
that I made ort use it unconditionally.  And when I made ort use it,
it was actually a minor performance penalty (~1% IIRC)[*], but I
thought it was worth it since (a) histogram diffs are more
understandable to users in general, (b) the histogram diff data
structures provide an idea for possibly solving some ugly corner cases
that I don't see a way to solve with the other diffs.

[*] Phillip came along and made histogram faster after my measurements
(663c5ad035 ("diff histogram: intern strings", 2021-11-17)), so that
may not be true anymore.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux