Re: [PATCH 0/2] Make diff plumbing commands respect the indentHeuristic.

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 9 May 2017 09:58:28 +0200

On Tue, May 9, 2017 at 5:16 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, May 01, 2017 at 12:34:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I don't know if we would want to be extra paranoid about patch-ids.
>> > There is no helping:
>> >
>> >   git rev-list HEAD | git diff-tree --stdin -p | git patch-id --stable
>> >
>> > because diff-tree doesn't know that it's trying for "--stable" output.
>> > But the diffs we compute internally for patch-id could disable the
>> > heuristics. I'm not sure if those matter, though. AFAIK those are used
>> > only for internal comparisons within a single program. I.e., we never
>> > compare them against input from the user, nor do we output them to the
>> > user. So they'll change, but I don't think anybody would care.
>>
>> I have a few-million row table with commit_id as one column & patch_id
>> as another. I.e. a commit -> patch_id mapping.
>
> Thanks for this data point. It's always interesting to hear about
> unforeseen uses of the tools.
>
> Out of curiosity, how do you generate the patch-ids? Is it with
> something like diff-tree piped to patch-id?

This:

    my $cmd = qq[git --git-dir="$repository_path" log --since="$since"
--until="$until" --all --pretty=format:%H --binary | git patch-id];
    open my $patch_id_fh, " $cmd |";

Which is part of a loop that generates since/until for continuous
pull/insertion. Also, a few lines later there's a workaround for the
git.git bug of patch-id being ^0+$ (fixed in 2485eab55c
("git-patch-id: do not trip over "no newline" markers", 2011-02-17)),
which gives you a sense of how long it's been since anyone's touched
this.

> I do feel a bit sad about breaking this case (or at the very least
> forcing you to set an option to retain cross-version compatibility). But
> my gut says that we don't want to lock ourselves into never changing the
> diff algorithm (and I'm sure we've done it inadvertently a few times
> over the years; even the recent switch to turning on renames would have
> had that impact).

As noted I think it's completely fine to change the patch-ids by
changing the diff algorithm.

I'm about to give some more detail on this in the other thread, but I
find that on our repos the indent heuristic changes the patch-id for
around 2% of patches, which seems fairly typical for non-changelog-y
code. You *then* need to be using topic branches you didn't delete as
well as having authored such a patch for this change to kick in, so
the impact is really minimal.

Even if it somehow changed 100% of the ids that would be fine too. It
would auto-heal as the same git version started reading & inserting
the ids, which are only relevant in a moving window.