On Sat, May 11, 2013 at 08:00:44PM -0700, Junio C Hamano wrote: > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > > > On Sat, May 11, 2013 at 2:49 PM, John Keeping <john@xxxxxxxxxxxxx> wrote: > >> > >> Hmm... I hadn't realised that. Looking a bit closer, it looks like > >> init_patch_ids sets up its own diffopts so its not affected by the > >> command line (except for pathspecs which would be easy to check for). > >> Of course that still means it can be affected by settings in the user's > >> configuration. > > > > .. and in the actual diff algorithm. > > As to the "objection" side of the argument, I already said > essentially the same thing several months ago: > > http://thread.gmane.org/gmane.comp.version-control.git/202654/focus=202898 > > and do not have much to add [*1*]. > > However. > > The use of patch-id in cherry and rebase is to facilitate avoiding > to replay commits that are obviously identical to the ones you have > in your history. The cached patch id for an existing old commit may > differ from a patch id you freshly compute for a new commit you are > trying to see if it truly new, even though they may represent the > same change. So we may incorrectly think such a new commit is not > yet in your history and attempt to replay it. > > But it is not a big problem. Either 3-way merge notices that there > is nothing new, or you get a conflict and have chance to inspect > what is going on. It's not a problem here, but false negatives would be annoying if you're looking at "git log --cherry-mark". > A conceptually much larger and more problematic issue is that we may > discard a truly new change that you still need as an old one you > already have due to a hash collision and discard it. Because the > hash space of SHA-1 is so large, however, it is not a problem in > practice, and more importantly, that hash space is just as large as > the hash space used by Git to reduce a patch to a patch id, the > filtering done with patch-id in cherry and rebase _already_ have > that exact problem with or without this additional cache layer. A > stale cache may make the possibility of lost change due to such a > hash collision merely twice as likely. > > > ... it's a "the patch ID actually ignores a lot of data in order > > to give the same ID even if lins have been added above it, and the > > patch is at different line numbers etc". > > Yes. > > > So maybe it doesn't matter. But at the same time, I really think > > caching patch ID's should be something people should be aware of is > > fundamentally wrong, even if it might work. > > I do not think it is "caching patch ID" that people should be aware > of is fundamentally wrong. What is fundamentally wrong, even if it > might work, is "using patch ID" itself. > > > And quite frankly, if you do rebases etc so much that you think patch > > ID's are so important that they need to be cached, you may be doing > > odd/wrong things. > > And that, too ;-) I've never noticed a problem with rebases, it's when I use "git log --cherry master..." to see if patches I've sent to a mailing list have been picked up. To take Git as an example (albeit a bad one because "What's Cooking" is a more useful way to track patch state here), if I compare this patch to pu I have: $ git rev-list --left-right --count pu... 234 1 and caching patch IDs takes that from ~0.6s to ~0.1s. When doing that over several branches consecutively that makes a big difference to the overall runtime, especially because most of the commits of interest will be cached during the first one. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html