Re: [PATCH v3 1/3] patch-id: make it stable against hunk reordering

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Mon, 31 Mar 2014 22:04:44 +0300

On Mon, Mar 31, 2014 at 10:59:50AM -0700, Junio C Hamano wrote:
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> writes:
> 
> > Patch id changes if users
> > 1. reorder file diffs that make up a patch
> > or
> > 2. split a patch up to multiple diffs that touch the same path
> > (keeping hunks within a single diff ordered to make patch valid).
> >
> > As the result is functionally equivalent, a different patch id is
> > surprising to many users.
> > In particular, reordering files using diff -O is helpful to make patches
> > more readable (e.g. API header diff before implementation diff).
> >
> > Change patch-id behaviour making it stable against these two kinds
> > of patch change:
> > 1. calculate SHA1 hash for each hunk separately and sum all hashes
> > (using a symmetrical sum) to get patch id
> > 2. hash the file-level headers together with each hunk (not just the
> > first hunk)
> >
> > We use a 20byte sum and not xor - since xor would give 0 output
> > for patches that have two identical diffs, which isn't all that
> > unlikely (e.g. append the same line in two places).
> >
> > Add a new flag --unstable to get the historical behaviour.
> >
> > Add --stable which is a nop, for symmetry.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > ---
> >
> > changes from v2:
> > 	several bugfixes
> > changes from v1:
> > 	hanges from v1: documented motivation for supporting
> > 	diff splitting (and not just file reordering).
> > 	No code changes.
> >
> >  builtin/patch-id.c | 72 ++++++++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 56 insertions(+), 16 deletions(-)
> 
> Does this have to interact or be consistent with patch-ids.c which
> is the real patch-id machinery used to filter like-changes out by
> "cherry-pick" and "log --cherry-pick"?

I don't know off-hand.

Specifically, this is diff_flush_patch_id and in diff.c, isn't it?

> This series opens a very interesting opportunity by making it
> possible to introduce the equivalence between two patches that touch
> the same file and a single patch that concatenates hunks from these
> two patches.
> 
> One example I am wondering about is perhaps this could be used to
> detect two branches, both sides with many patches cherry-picked from
> the same source, but some patches squashed together on one branch
> but not on the other.  It would be very nice if you can detect that
> two sets of patches are equivalent taken as a whole in such a
> situation while rebasing one on top of the other.
> 
> Another example is that another mode that gives a set of broken-up
> patch-ids for each hunk contained in the input.  Suppose there is a
> patch that is only meant to be used on the proprietary fork of an
> open source project, and the project releases the open source
> portion by cherry-picking topics from the development tree used for
> the proprietary "trunk".  The integration service of such a project
> used to prepare the open source branch may want to have a
> pre-receive hook that says "do not merge any commit to cause this
> this hunk appear in the result, no matter what other changes the
> patches in the commit may bring in", and broken-down patch-ids
> (e.g. "diff HEAD...$commit | patch-id --individual") may be an
> ingredient to implement such a hook.  There may be interesting
> applications other than such a "broken-down patch-ids" that can be
> based on the enhancement you are presenting here.

OK sure, I can tweak that to use the same algorithm if desired,
though it does look like an unrelated enhancement to me.
Agree?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html