Re: [PATCH] diff: add a config option to control orderfile

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Tue, 24 Sep 2013 08:54:19 +0300

On Mon, Sep 23, 2013 at 02:37:29PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Michael S. Tsirkin wrote:
> >> On Tue, Sep 17, 2013 at 04:56:16PM -0400, Jeff King wrote:
> 
> >>>>> A problem with both schemes, though, is that they are not
> >>>>> backwards-compatible with existing git-patch-id implementations.
> [...]
> >>> It may be esoteric enough not to worry about, though.
> 
> Yeah, I think it would be okay.  Details of the diff generation
> algorithm have changed from time to time anyway (and broken things,
> as you mentioned) and we make no guarantee about this.
> 
> [...]
> >> patch-id: make it more stable
> >>
> >> Add a new patch-id algorithm making it stable against
> >> hunk reodering:
> >> 	- prepend header to each hunk (if not there)
> >> 	- calculate SHA1 hash for each hunk separately
> >> 	- sum all hashes to get patch id
> >>
> >> Add --order-sensitive to get historical unstable behaviour.
> 
> The --order-sensitive option seems confusing.  How do I use it to
> replicate a historical patch-id?

You supply a historical diff to it :)

> If I record all options that might
> have influenced ordering (which are those?) then am I guaranteed to
> get a reproducible result?  

Maybe not. But if you have a patch on disk, you will get
old hash from it with --order-sensitive.

> So I would prefer either of the following over the above:
> 
>  a) When asked to compute the patch-id of a seekable file, use the
>     current streaming implementation until you notice a filename that
>     is out of order.  Then start over with sorted hunks (for example
>     building a table of offsets within the patch for each hunk to
>     support this).
> 
>     When asked to compute the patch-id of an unseekable file, stream
>     to a temporary file under $GIT_DIR to get a seekable file.

This can be computed in one pass: just keep two checksums around.

But the result won't be stable: if you get same patch from two
people one is ordered, the other isn't, you get two different checksums.

What are we trying to achieve here?

>  b) Unconditionally use the new patch-id definition that is stable
>     under permutation of hunks.  If and when someone complains that
>     this invalidates their old patch-ids, they can work on adding a
>     nice interface for getting the old-style patch-ids.  I suspect it
>     just wouldn't come up.

That's certainly easy to implement.

> Of course I can easily be wrong.  Thanks for a clear patch that makes
> the choices easy to reasonable about.
> 
> Thoughts?
> Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html