Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/02/09 12:26AM, Elijah Newren wrote:
> Hi John,
> 
> On Mon, Feb 6, 2023 at 12:02 PM John Cai <johncai86@xxxxxxxxx> wrote:
> >
> > Hi Phillip,
> >
> > On 6 Feb 2023, at 11:27, Phillip Wood wrote:
> >
> > > Hi John
> > >
> > > On 05/02/2023 03:46, John Cai via GitGitGadget wrote:
> > >> From: John Cai <johncai86@xxxxxxxxx>
> > >>
> > >> It can be useful to specify diff algorithms per file type. For example,
> > >> one may want to use the minimal diff algorithm for .json files, another
> > >> for .c files, etc.
> > >
> > > Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types.
> >
> > At $DAYJOB, there has been a discussion and request for a feature like this [1].
> > One use case that came up was to be able to set a different diff algorithm for
> > .json files.
> >
> > 1. https://gitlab.com/gitlab-org/gitaly/-/issues/2591
> 
> A couple points:
> 
> First, there seems to be a misunderstanding in that issue.  In
> particular, the merge algorithm does call into the xdiff library to do
> the three-way content merge of individual files, and when it does so,
> it has to specify the diff algorithm (or take the default, currently
> myers).  merge-recursive allows the diff algorithm to be specified by
> the user (there are
> -Xdiff-algorithm={histogram,minimal,patience,myers} flags to
> merge/rebase for it), while merge-ort uses histogram (though it uses
> the same parser as merge-recursive and thus gets the variables set
> from the -Xdiff-algorithm flag, it just ignores those values and
> hardcodes histogram).
> 
> Second, I also think the user request got converted to a particular
> solution without looking at the wider problem space:  The idea seemed
> to assume "myers" is default for a good reason, and thus asked for an
> option to use something else.  I'm not sure the assumption is valid; I
> think "myers" is default for historical reasons and histogram is
> better not just for special Salesforce xml files, but code files too.
> The output makes more sense to users.  So much so that even though my
> simple testing suggested it had a 2% performance penalty compared to
> myers, I forced ort to use it[1] even though I designed  everything
> else in that algorithm around eking out maximum performance.  Others
> who have tested the diff algorithms have also found histogram has very
> similar performance to myers, and oftentimes even beats it[2][3].
> Also, worries about invalidating rerere caches[4] was real, but we
> already paid that price when we switched to ort.  And if performance
> is still a worry, [3] gives me reason to believe we can make our
> histogram implementation faster.  Finally, for the period of time when
> Palantir was letting me make an internal git distribution (mostly for
> testing ort), I also carried a patch that changed the default diff
> algorithm to histogram (not just for ort, but for diff/log/etc. as
> well).  Never had any complaints from the users from it.  Perhaps you
> could do the same in your local version of git used by gitaly?

Thanks for that suggestion, Elijah. Changing the diff alg to histogram has been
something we've considered doing as well. However, we've gotten customer
requests to also have the option to view their diffs with patience. Seeing as
the diff algorithm is sometimes a matter of preference, we thought giving users
the control through the repository would be nice.

thanks
John

> 
> [1] See c8017176ac ("merge-ort: use histogram diff", 2020-12-13)
> [2] From 85551232b5 ("perf: compare diff algorithms", 2012-03-06):
> "This does indeed show that histogram diff slightly beats Myers, while
> patience is much slower than the others."
> [3] https://github.com/pascalkuthe/imara-diff
> [4] https://lore.kernel.org/git/20120307114714.GA14990@xxxxxxxxxxxxxxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux