On Tue, May 9, 2017 at 5:16 AM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, May 01, 2017 at 12:34:38PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > I don't know if we would want to be extra paranoid about patch-ids. >> > There is no helping: >> > >> > git rev-list HEAD | git diff-tree --stdin -p | git patch-id --stable >> > >> > because diff-tree doesn't know that it's trying for "--stable" output. >> > But the diffs we compute internally for patch-id could disable the >> > heuristics. I'm not sure if those matter, though. AFAIK those are used >> > only for internal comparisons within a single program. I.e., we never >> > compare them against input from the user, nor do we output them to the >> > user. So they'll change, but I don't think anybody would care. >> >> I have a few-million row table with commit_id as one column & patch_id >> as another. I.e. a commit -> patch_id mapping. > > Thanks for this data point. It's always interesting to hear about > unforeseen uses of the tools. > > Out of curiosity, how do you generate the patch-ids? Is it with > something like diff-tree piped to patch-id? This: my $cmd = qq[git --git-dir="$repository_path" log --since="$since" --until="$until" --all --pretty=format:%H --binary | git patch-id]; open my $patch_id_fh, " $cmd |"; Which is part of a loop that generates since/until for continuous pull/insertion. Also, a few lines later there's a workaround for the git.git bug of patch-id being ^0+$ (fixed in 2485eab55c ("git-patch-id: do not trip over "no newline" markers", 2011-02-17)), which gives you a sense of how long it's been since anyone's touched this. > I do feel a bit sad about breaking this case (or at the very least > forcing you to set an option to retain cross-version compatibility). But > my gut says that we don't want to lock ourselves into never changing the > diff algorithm (and I'm sure we've done it inadvertently a few times > over the years; even the recent switch to turning on renames would have > had that impact). As noted I think it's completely fine to change the patch-ids by changing the diff algorithm. I'm about to give some more detail on this in the other thread, but I find that on our repos the indent heuristic changes the patch-id for around 2% of patches, which seems fairly typical for non-changelog-y code. You *then* need to be using topic branches you didn't delete as well as having authored such a patch for this change to kick in, so the impact is really minimal. Even if it somehow changed 100% of the ids that would be fine too. It would auto-heal as the same git version started reading & inserting the ids, which are only relevant in a moving window.