So instead of: id = has_commit_patch_id(commit, &ids); We should use some sort of iterator, in order to call: id->commit->object.flags |= cherry_flag; for all commits which are related to this patch? -- Arnaud Morin On 12.01.21 - 10:11, Jeff King wrote: > On Tue, Jan 12, 2021 at 09:17:37AM -0500, Jeff King wrote: > > > It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids: > > stop using a hand-rolled hashmap implementation, 2016-07-29). > > > > I think the issue is that it is allowing duplicate entries in the > > hashmap. The algorithm is something like: > > > > - iterate over left-hand commits, inserting patch-id for each into > > hashmap > > > > - iterate over right-hand commits, seeing if any are present in > > hashmap. If so, we exclude the commit _and_ mark the patch-id as > > "seen" > > > > - iterate again over left-hand commits, omitting any whose patch-ids > > are marked as "seen" > > > > So if two commits on the left-hand side have the same patch-id, if we > > insert two entries into the hashmap, then only one of them is going to > > get its "seen" flag marked in the middle step. > > Yeah, that's definitely it. Here's what the fix would look like directly > on top of dfb7a1b4d0: > > diff --git a/patch-ids.c b/patch-ids.c > index db31fa647a..a8ed4f0872 100644 > --- a/patch-ids.c > +++ b/patch-ids.c > @@ -66,12 +66,24 @@ struct patch_id *add_commit_patch_id(struct commit *commit, > struct patch_ids *ids) > { > struct patch_id *key = xcalloc(1, sizeof(*key)); > + struct patch_id *old; > > if (init_patch_id_entry(key, commit, ids)) { > free(key); > return NULL; > } > > + /* > + * Don't allow duplicates. Note that we can't use hashmap_put here; we > + * must retain the _old_ struct, because some commit->util is pointing > + * to it. > + */ > + old = hashmap_get(&ids->patches, key, ids); > + if (old) { > + free(key); > + return old; > + } > + > hashmap_add(&ids->patches, key); > return key; > } > diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh > index 28d4f6b259..0c5f1dcc54 100755 > --- a/t/t6007-rev-list-cherry-pick-file.sh > +++ b/t/t6007-rev-list-cherry-pick-file.sh > @@ -207,4 +207,16 @@ test_expect_success '--count --left-right' ' > test_cmp expect actual > ' > > +test_expect_success '--cherry-pick with duplicates on each side' ' > + git checkout -b dup-orig && > + test_commit dup-base && > + git revert dup-base && > + git cherry-pick dup-base && > + git checkout -b dup-side HEAD~3 && > + test_tick && > + git cherry-pick -3 dup-orig && > + git rev-list --cherry-pick dup-orig...dup-side >actual && > + test_must_be_empty actual > +' > + > test_done > > > Unfortunately we can't do the same thing now. The "seen" flag went away > in 683f17ec44 (patch-ids: replace the seen indicator with a commit > pointer, 2016-07-29), in favor of having each patch_id struct point to a > "struct commit". We could revert that, or turn that pointer into a list, > but it gets worse. Later, b3dfeebb92 (rebase: avoid computing > unnecessary patch IDs, 2016-07-29) built on that by lazily computing the > patch-ids. So each patch_id struct must point to exactly one commit! > > We'll have to either: > > - split them apart into two structs at the time of the lazy computation > > or > > - accept having duplicate patch_ids, but on lookup make sure we find > _all_ matches, not just the first one. > > Hmm, that last one might not be too bad, since we have > hashmap_get_next() (it does mean iterating over a chained bucket > linearly, but that's already the worst-case for a hash lookup). > > -Peff