Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Sat, May 11, 2013 at 2:49 PM, John Keeping <john@xxxxxxxxxxxxx> wrote: >> >> Hmm... I hadn't realised that. Looking a bit closer, it looks like >> init_patch_ids sets up its own diffopts so its not affected by the >> command line (except for pathspecs which would be easy to check for). >> Of course that still means it can be affected by settings in the user's >> configuration. > > .. and in the actual diff algorithm. As to the "objection" side of the argument, I already said essentially the same thing several months ago: http://thread.gmane.org/gmane.comp.version-control.git/202654/focus=202898 and do not have much to add [*1*]. However. The use of patch-id in cherry and rebase is to facilitate avoiding to replay commits that are obviously identical to the ones you have in your history. The cached patch id for an existing old commit may differ from a patch id you freshly compute for a new commit you are trying to see if it truly new, even though they may represent the same change. So we may incorrectly think such a new commit is not yet in your history and attempt to replay it. But it is not a big problem. Either 3-way merge notices that there is nothing new, or you get a conflict and have chance to inspect what is going on. A conceptually much larger and more problematic issue is that we may discard a truly new change that you still need as an old one you already have due to a hash collision and discard it. Because the hash space of SHA-1 is so large, however, it is not a problem in practice, and more importantly, that hash space is just as large as the hash space used by Git to reduce a patch to a patch id, the filtering done with patch-id in cherry and rebase _already_ have that exact problem with or without this additional cache layer. A stale cache may make the possibility of lost change due to such a hash collision merely twice as likely. > ... it's a "the patch ID actually ignores a lot of data in order > to give the same ID even if lins have been added above it, and the > patch is at different line numbers etc". Yes. > So maybe it doesn't matter. But at the same time, I really think > caching patch ID's should be something people should be aware of is > fundamentally wrong, even if it might work. I do not think it is "caching patch ID" that people should be aware of is fundamentally wrong. What is fundamentally wrong, even if it might work, is "using patch ID" itself. > And quite frankly, if you do rebases etc so much that you think patch > ID's are so important that they need to be cached, you may be doing > odd/wrong things. And that, too ;-) [Footnote] *1* For people listening from the sidelines, the fact that Git algorithm can improve over time is a real issue, and and has caused one issue that still hasn't been solved in the k.org upload process. Somebody who has a repository there could *theoretically*: - push her v1.1 release via Git ("git push origin v1.1"); - create a tarball ("git archive -o v1.1.tar v1.1") and diff since the last release ("git diff v1.0 v1.1 >v1.0-v1.1.diff") locally; - GPG sign them ("gpg -b v1.1.tar", "gpg -b v1.0-v1.1.diff"); and - upload only the signature files and have k.org create the tarballs and diff to save bandwidth of uploading logically derivable stuff over and over again. But that can be done only when output from "git archive" and "git diff" are stable, which is not the case. We changed how extended header fields are used in the tar archive for a long pathname recently, and also we changed use of XDF_NEED_MINIMAL a couple of years ago in "git diff"; both of these affect the output. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html