On Tue, Sep 17, 2013 at 11:38:07PM +0300, Michael S. Tsirkin wrote: > On Tue, Sep 17, 2013 at 04:18:28PM -0400, Jeff King wrote: > > On Tue, Sep 17, 2013 at 11:16:04PM +0300, Michael S. Tsirkin wrote: > > > > > > Thinking about it some more, it's a best effort thing anyway, > > > > correct? > > > > > > > > So how about, instead of doing a hash over the whole input, > > > > we hash each chunk and XOR them together? > > > > > > > > This way it will be stable against chunk reordering, and > > > > no need to keep patch in memory. > > > > > > > > Hmm? > > > > > > ENOCOFFEE > > > > > > That was a silly suggestion, two identical chunks aren't that unlikely :) > > > > In a single patch, they should not be, as we should be taking into > > account the filenames, no? > > Right. > > > You could also do it hierarchically. Hash each chunk, store only the > > hashes, then sort them and hash the result. That still has O(chunks) > > storage, but it is only one hash per chunk, not the whole data. > > Could be optional too :) > Or maybe just sum byte by byte instead. One's complement probably ... > > A problem with both schemes, though, is that they are not > > backwards-compatible with existing git-patch-id implementations. > > Could you clarify? > We never send patch IDs on the wire - how isn't this compatible? > > > Whereas > > sorting the data itself is (kind of, at least with respect to people who > > are not using orderfile). > > > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html