On Tue, Jan 12, 2021 at 09:17:37AM -0500, Jeff King wrote: > It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids: > stop using a hand-rolled hashmap implementation, 2016-07-29). > > I think the issue is that it is allowing duplicate entries in the > hashmap. The algorithm is something like: > > - iterate over left-hand commits, inserting patch-id for each into > hashmap > > - iterate over right-hand commits, seeing if any are present in > hashmap. If so, we exclude the commit _and_ mark the patch-id as > "seen" > > - iterate again over left-hand commits, omitting any whose patch-ids > are marked as "seen" > > So if two commits on the left-hand side have the same patch-id, if we > insert two entries into the hashmap, then only one of them is going to > get its "seen" flag marked in the middle step. Yeah, that's definitely it. Here's what the fix would look like directly on top of dfb7a1b4d0: diff --git a/patch-ids.c b/patch-ids.c index db31fa647a..a8ed4f0872 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -66,12 +66,24 @@ struct patch_id *add_commit_patch_id(struct commit *commit, struct patch_ids *ids) { struct patch_id *key = xcalloc(1, sizeof(*key)); + struct patch_id *old; if (init_patch_id_entry(key, commit, ids)) { free(key); return NULL; } + /* + * Don't allow duplicates. Note that we can't use hashmap_put here; we + * must retain the _old_ struct, because some commit->util is pointing + * to it. + */ + old = hashmap_get(&ids->patches, key, ids); + if (old) { + free(key); + return old; + } + hashmap_add(&ids->patches, key); return key; } diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index 28d4f6b259..0c5f1dcc54 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -207,4 +207,16 @@ test_expect_success '--count --left-right' ' test_cmp expect actual ' +test_expect_success '--cherry-pick with duplicates on each side' ' + git checkout -b dup-orig && + test_commit dup-base && + git revert dup-base && + git cherry-pick dup-base && + git checkout -b dup-side HEAD~3 && + test_tick && + git cherry-pick -3 dup-orig && + git rev-list --cherry-pick dup-orig...dup-side >actual && + test_must_be_empty actual +' + test_done Unfortunately we can't do the same thing now. The "seen" flag went away in 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), in favor of having each patch_id struct point to a "struct commit". We could revert that, or turn that pointer into a list, but it gets worse. Later, b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the patch-ids. So each patch_id struct must point to exactly one commit! We'll have to either: - split them apart into two structs at the time of the lazy computation or - accept having duplicate patch_ids, but on lookup make sure we find _all_ matches, not just the first one. Hmm, that last one might not be too bad, since we have hashmap_get_next() (it does mean iterating over a chained bucket linearly, but that's already the worst-case for a hash lookup). -Peff