YAY! Thanks a lot :) -- Arnaud Morin On 12.01.21 - 10:52, Jeff King wrote: > On Tue, Jan 12, 2021 at 03:34:38PM +0000, Arnaud Morin wrote: > > > So instead of: > > id = has_commit_patch_id(commit, &ids); > > > > We should use some sort of iterator, in order to call: > > id->commit->object.flags |= cherry_flag; > > > > for all commits which are related to this patch? > > Yep, exactly. > > Here it is as a patch. :) > > -- >8 -- > Subject: [PATCH] patch-ids: handle duplicate hashmap entries > > This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a > hand-rolled hashmap implementation, 2016-07-29) in which > > git rev-list --cherry-pick A...B > > will fail to suppress commits reachable from A even if a commit with > matching patch-id appears in B. > > Around the time of that commit, the algorithm for "--cherry-pick" looked > something like this: > > 0. Traverse all of the commits, marking them as being on the left or > right side of the symmetric difference. > > 1. Iterate over the left-hand commits, inserting a patch-id struct for > each into a hashmap, and pointing commit->util to the patch-id > struct. > > 2. Iterate over the right-hand commits, checking which are present in > the hashmap. If so, we exclude the commit from the output _and_ we > mark the patch-id as "seen". > > 3. Iterate again over the left-hand commits, checking whether > commit->util->seen is set; if so, exclude them from the output. > > At the end, we'll have eliminated commits from both sides that have a > matching patch-id on the other side. But there's a subtle assumption > here: for any given patch-id, we must have exactly one struct > representing it. If two commits from A both have the same patch-id and > we allow duplicates in the hashmap, then we run into a problem: > > a. In step 1, we insert two patch-id structs into the hashmap. > > b. In step 2, our lookups will find only one of these structs, so only > one "seen" flag is marked. > > c. In step 3, one of the commits in A will have its commit->util->seen > set, but the other will not. We'll erroneously output the latter. > > Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, > it used hashmap_add(), which explicitly does allow duplicates. > > At that point, the solution would have been easy: when we are about to > add a duplicate, skip doing so and return the existing entry which > matches. But it gets more complicated. > > In 683f17ec44 (patch-ids: replace the seen indicator with a commit > pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, > when the right-hand side finds a matching patch_id from the left-hand > side, we can directly mark the left-hand patch_id->commit to be omitted. > Solving that would be easy, too; there's a one-to-many relationship of > patch-ids to commits, so we just need to keep a list. > > But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary > patch IDs, 2016-07-29) built on that by lazily computing the full > patch-ids. So we don't even know when adding to the hashmap whether two > commits truly have the same id. We'd have to tentatively assign them a > list, and then possibly split them apart (possibly into N new structs) > at the moment we compute the real patch-ids. This could work, but it's > complicated and error-prone. > > Instead, let's accept that we may store duplicates, and teach the lookup > side to be more clever. Rather than asking for a single matching > patch-id, it will need to iterate over all matching patch-ids. This does > mean examining every entry in a single hash bucket, but the worst-case > for a hash lookup was already doing that. > > We'll keep the hashmap details out of the caller by providing a simple > iteration interface. We can retain the simple has_commit_patch_id() > interface for the other callers, but we'll simplify its return value > into an integer, rather than returning the patch_id struct. That way > they won't be tempted to look at the "commit" field of the return value > without iterating. > > Reported-by: Arnaud Morin <arnaud.morin@xxxxxxxxx> > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > patch-ids.c | 14 +++++++++++++- > patch-ids.h | 20 +++++++++++++++++++- > revision.c | 6 ++++-- > t/t6007-rev-list-cherry-pick-file.sh | 12 ++++++++++++ > 4 files changed, 48 insertions(+), 4 deletions(-) > > diff --git a/patch-ids.c b/patch-ids.c > index 21973e4933..f51021a0cb 100644 > --- a/patch-ids.c > +++ b/patch-ids.c > @@ -89,7 +89,7 @@ static int init_patch_id_entry(struct patch_id *patch, > return 0; > } > > -struct patch_id *has_commit_patch_id(struct commit *commit, > +struct patch_id *patch_id_iter_first(struct commit *commit, > struct patch_ids *ids) > { > struct patch_id patch; > @@ -104,6 +104,18 @@ struct patch_id *has_commit_patch_id(struct commit *commit, > return hashmap_get_entry(&ids->patches, &patch, ent, NULL); > } > > +struct patch_id *patch_id_iter_next(struct patch_id *cur, > + struct patch_ids *ids) > +{ > + return hashmap_get_next_entry(&ids->patches, cur, ent); > +} > + > +int has_commit_patch_id(struct commit *commit, > + struct patch_ids *ids) > +{ > + return !!patch_id_iter_first(commit, ids); > +} > + > struct patch_id *add_commit_patch_id(struct commit *commit, > struct patch_ids *ids) > { > diff --git a/patch-ids.h b/patch-ids.h > index 03bb04e707..ab6c6a6804 100644 > --- a/patch-ids.h > +++ b/patch-ids.h > @@ -23,7 +23,25 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, > struct object_id *oid, int, int); > int init_patch_ids(struct repository *, struct patch_ids *); > int free_patch_ids(struct patch_ids *); > + > +/* Add a patch_id for a single commit to the set. */ > struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *); > -struct patch_id *has_commit_patch_id(struct commit *, struct patch_ids *); > + > +/* Returns true if the patch-id of "commit" is present in the set. */ > +int has_commit_patch_id(struct commit *commit, struct patch_ids *); > + > +/* > + * Iterate over all commits in the set whose patch id matches that of > + * "commit", like: > + * > + * struct patch_id *cur; > + * for (cur = patch_id_iter_first(commit, ids); > + * cur; > + * cur = patch_id_iter_next(cur, ids) { > + * ... look at cur->commit > + * } > + */ > +struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *); > +struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *); > > #endif /* PATCH_IDS_H */ > diff --git a/revision.c b/revision.c > index 9dff845bed..7ec9b634e3 100644 > --- a/revision.c > +++ b/revision.c > @@ -1241,12 +1241,14 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) > /* > * Have we seen the same patch id? > */ > - id = has_commit_patch_id(commit, &ids); > + id = patch_id_iter_first(commit, &ids); > if (!id) > continue; > > commit->object.flags |= cherry_flag; > - id->commit->object.flags |= cherry_flag; > + do { > + id->commit->object.flags |= cherry_flag; > + } while ((id = patch_id_iter_next(id, &ids))); > } > > free_patch_ids(&ids); > diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh > index f0268372d2..8bf5ae23c2 100755 > --- a/t/t6007-rev-list-cherry-pick-file.sh > +++ b/t/t6007-rev-list-cherry-pick-file.sh > @@ -245,6 +245,18 @@ test_expect_success '--count --left-right' ' > test_cmp expect actual > ' > > +test_expect_success '--cherry-pick with duplicates on each side' ' > + git checkout -b dup-orig && > + test_commit dup-base && > + git revert dup-base && > + git cherry-pick dup-base && > + git checkout -b dup-side HEAD~3 && > + test_tick && > + git cherry-pick -3 dup-orig && > + git rev-list --cherry-pick dup-orig...dup-side >actual && > + test_must_be_empty actual > +' > + > # Corrupt the object store deliberately to make sure > # the object is not even checked for its existence. > remove_loose_object () { > -- > 2.30.0.455.gdab6b73766 >