"John Cai via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: Looking good. Some comments below. Many of them minor. > +Setting the internal diff algorithm > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > + > +The diff algorithm can be set through the `diff.algorithm` config key, but > +sometimes it may be helpful to set the diff algorithm by path. For example, one I would have expected "per path" instead of "by path". > +might wish to set a diff algorithm automatically for all `.json` files such that > +the user would not need to pass in a separate command line `--diff-algorithm` > +flag each time. While this is not incorrect per-se, I think the first paragraph of the proposed commit log message was a lot more convincing. Your changes may not be limited to a single kind of files, and a command line option is simply not enough. You may want one algorithm for ".json" while using another for ".c", which was really an excellent example you gave. > +This diff algorithm applies to user facing diff output like git-diff(1), > +git-show(1) and is used for the `--stat` output as well. The merge machinery > +will not use the diff algorithm set through this method. Is "format-patch" considered "user-facing"? > +NOTE: If the `command` key also exists, then Git will treat this as an external > +diff and attempt to use the value set for `command` as an external program. For > +instance, the following config, combined with the above `.gitattributes` file, > +will result in `command` favored over `algorithm`. > + > +---------------------------------------------------------------- > +[diff "<name>"] > + command = j-c-diff > + algorithm = histogram > +---------------------------------------------------------------- Isn't this a bit too verbose, given that the reader has just seen the external diff driver section. I wonder something like this is sufficient, without any sample configuration? NOTE: If `diff.<name>.command` is defined for path with the `diff=<name>` attribute, it is executed as an external diff driver (see above), and adding `diff.<name>.algorithm` has no effect (the algorithm is not passed to the external diff driver). > diff --git a/diff.c b/diff.c > index 5efc22ca06b..04469da6d34 100644 > --- a/diff.c > +++ b/diff.c > @@ -4456,15 +4456,13 @@ static void run_diff_cmd(const char *pgm, > const char *xfrm_msg = NULL; > int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score; > int must_show_header = 0; > + struct userdiff_driver *drv = NULL; > > - > - if (o->flags.allow_external) { > - struct userdiff_driver *drv; > - > + if (o->flags.allow_external || !o->ignore_driver_algorithm) > drv = userdiff_find_by_path(o->repo->index, attr_path); > - if (drv && drv->external) > - pgm = drv->external; > - } > + > + if (o->flags.allow_external && drv && drv->external) > + pgm = drv->external; OK. There is no explicit "pgm = NULL" initialization in this function, but that is done by the caller passing NULL to the function as a parameter, so it all makes sense. > @@ -4481,12 +4479,16 @@ static void run_diff_cmd(const char *pgm, > run_external_diff(pgm, name, other, one, two, xfrm_msg, o); > return; > } > - if (one && two) > + if (one && two) { > + if (drv && !o->ignore_driver_algorithm && drv->algorithm) > + set_diff_algorithm(o, drv->algorithm); For symmetry with the above choice of pgm we just saw, the order of the condition might be easier to follow if written like so: if (!o->ignore_driver_algorithm && drv && drv->algorithm) It would not make any measurable difference performance-wise either way. > @@ -4583,6 +4585,14 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, > const char *name; > const char *other; > > + if (!o->ignore_driver_algorithm) { > + struct userdiff_driver *drv = userdiff_find_by_path(o->repo->index, p->one->path); That's an overlong line. > + > + if (drv && drv->algorithm) { > + set_diff_algorithm(o, drv->algorithm); > + } No need to have {} around a single statement block. > + } > + > if (DIFF_PAIR_UNMERGED(p)) { > /* unmerged */ > builtin_diffstat(p->one->path, NULL, NULL, NULL, > @@ -5130,6 +5140,8 @@ static int diff_opt_diff_algorithm(const struct option *opt, > return error(_("option diff-algorithm accepts \"myers\", " > "\"minimal\", \"patience\" and \"histogram\"")); > > + options->ignore_driver_algorithm = 1; > + > return 0; > } > > @@ -5145,6 +5157,8 @@ static int diff_opt_diff_algorithm_no_arg(const struct option *opt, > BUG("available diff algorithms include \"myers\", " > "\"minimal\", \"patience\" and \"histogram\""); > > + options->ignore_driver_algorithm = 1; > + > return 0; > } > @@ -5285,6 +5299,7 @@ static int diff_opt_patience(const struct option *opt, > for (i = 0; i < options->anchors_nr; i++) > free(options->anchors[i]); > options->anchors_nr = 0; > + options->ignore_driver_algorithm = 1; > > return set_diff_algorithm(options, "patience"); > } I was hoping that set_diff_algorithm() can be the shared common one that signals we were told to use a specific algorithm, but it also is called from internal codepaths so it cannot be it. It is probably not worth introducing an extra helper that only calls set_diff_algorithm() and sets ignore_driver_algorithm bit only for that to reduce three-times repetition. OK. > diff --git a/userdiff.c b/userdiff.c > index d71b82feb74..ff25cfc4b4c 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -293,7 +293,7 @@ PATTERNS("scheme", > "|([^][)(}{[ \t])+"), > PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", > "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"), > -{ "default", NULL, -1, { NULL, 0 } }, > +{ "default", NULL, NULL, -1, { NULL, 0 } }, > }; I was surprised that there is so little damage to the built-in userdiff driver definitions, but this is thanks to the PATTERNS() and IPATTERN() macro that use designated initializers. Very nice. Nicely done.