Re: avoid duplicate patches from git log ?

"Philip Oakley" <philipoakley@xxxxxxx> · Tue, 3 May 2016 23:36:32 +0100

From: "Junio C Hamano" <gitster@xxxxxxxxx>
Jeff King <peff@xxxxxxxx> writes:

On Tue, May 03, 2016 at 09:11:55PM +0100, Philip Oakley wrote:

However, as the G4W project (https://github.com/git-for-windows/git/)
follows the main git repo and its releases, it needs to rebase it's 
fixup
patches, while retaining their original series, so has repeated copies 
of
those fix patches on the second parent path (a technique Dscho called
rebasing merges).

for example:
> bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
> a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
> 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
> 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
> 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)

How can I filter out all the duplicate patches which are identical other
than the commit date?

The --left --right and --cherry don't appear to do what I'd expect/hope. 
Any
suggestions?

I don't think there's a good way right now. The option that suppresses
commits is --cherry-pick, but it wants there to be a "left" and "right"
from a symmetric difference, and to cull duplicates from the various
sides.

I think you really just want to keep a running list of all of the
commits you've seen and cull any duplicates. I guess you'd want this as
part of the history simplification step, so that whole uninteresting
side-branches are culled.

The obvious choice for matching two commits is patch-id, though it can
be expensive to generate. There have been patches playing around with
caching in the past, but nothing merged. For your purposes, I suspect
matching an "(author, authordate, subject)" tuple would be sufficient
and fast.

What would be really interesting is what should happen when the side
"rebase merge" branch that is supposed to be irrelevant for the
purpose of explaining the overall history does not become empty
after such filtering operation.  The merge commit itself may claim
that both branches are equivalent, but in reality it may turn out
that the merge failed to reflect the effect of some other changes in
the history of the side branch in the result--which would be a
ticking time-bomb for future mismerges waiting to happen.

I think that's a misunderstanding of the development process for an "on top 
of" project, where the upstream would not be expected to take all the fixups 
for that project's customers.

The releases of the project do need to be retained in the history, but 
because of the "on top of" policy, the prior release becomes a second parent 
to a "theirs" merge commit of the upstream (and subsequent rebase on top of 
that).

Thus when seaching history the first parent route would have the fastest 
transition to the upstream, but the full history would still have all the 
releases on it.

It may be that Peff's suggestion is a workable heuristic for a rebase flow 
where one could eliminate those duplicates quite easily. I just had a 
feeling that there was already something that did the patch-id thing for 
duplicate removals, but obviously I had that wrong.

--
Philip 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html