On Fri, Jan 15, 2021 at 10:43 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > Hi, > > On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@xxxxxxxxx> wrote: > > > > I noticed this is still only in my local branch. Can I get an ACK/NAK? > > Sorry for missing this when you posted in August. Thanks for sending > in the update from v1. > > For other reviewers: v1 is over here: > https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@xxxxxxxxx/, > and has review comments from Martin, me, Peff, and Junio. > > > On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@xxxxxxxxx> wrote: > >> > >> From: Phil Hord <phil.hord@xxxxxxxxx> > >> > >> 'git tag -d' and 'git branch -d' both accept one or more refs to > >> delete, but each deletion is done by calling `delete_ref` on each argv. > >> This is very slow when removing from packed refs as packed-refs is > >> locked and rewritten each time. Use delete_refs instead so all the > >> removals can be done inside a single transaction with a single update. > > Awesome, thanks for also fixing up git branch with v2. > > >> Since delete_refs performs all the packed-refs delete operations > >> inside a single transaction, if any of the deletes fail then all > >> them will be skipped. In practice, none of them should fail since > >> we verify the hash of each one before calling delete_refs, but some > >> network error or odd permissions problem could have different results > >> after this change. > >> > >> Also, since the file-backed deletions are not performed in the same > >> transaction, those could succeed even when the packed-refs transaction > >> fails. > >> > >> After deleting refs, report the deletion's success only if the ref was > >> actually deleted. For branch deletion, remove the branch config only > >> if the branch ref is actually removed. > >> > >> A manual test deleting 24,000 tags took about 30 minutes using > >> delete_ref. It takes about 5 seconds using delete_refs. > > As I said on v1, it's really nice to have this fixed. Thanks for doing it. > > >> > >> Signed-off-by: Phil Hord <phil.hord@xxxxxxxxx> > >> --- > >> This reroll adds the same delete_refs change to 'git branch'. It checks > >> individual refs after the operation to report correctly on each whether > >> it was successfully deleted or not. Maybe this is an unnecessary step, > >> though. This handles the weird case where some file system error > >> prevented us from deleting refs, leaving us with an error from > >> delete_refs but without any idea which refs might have been affected. > >> > >> builtin/branch.c | 50 +++++++++++++++++++++++++++++------------------- > >> builtin/tag.c | 45 +++++++++++++++++++++++++++++++++---------- > >> 2 files changed, 65 insertions(+), 30 deletions(-) > >> > >> diff --git a/builtin/branch.c b/builtin/branch.c > >> index 2ef214632f..2273239f41 100644 > >> --- a/builtin/branch.c > >> +++ b/builtin/branch.c > >> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds, > >> int remote_branch = 0; > >> struct strbuf bname = STRBUF_INIT; > >> unsigned allowed_interpret; > >> + struct string_list refs_to_delete = STRING_LIST_INIT_DUP; > >> + struct string_list_item *item; > >> + int refname_pos = 0; > >> > >> switch (kinds) { > >> case FILTER_REFS_REMOTES: > >> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds, > >> /* For subsequent UI messages */ > >> remote_branch = 1; > >> allowed_interpret = INTERPRET_BRANCH_REMOTE; > >> - > >> + refname_pos = 13; > >> force = 1; > >> break; > >> case FILTER_REFS_BRANCHES: > >> fmt = "refs/heads/%s"; > >> allowed_interpret = INTERPRET_BRANCH_LOCAL; > >> + refname_pos = 11; > >> break; > >> default: > >> die(_("cannot use -a with -d")); > >> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds, > >> goto next; > >> } > >> > >> - if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid, > >> - REF_NO_DEREF)) { > >> - error(remote_branch > >> - ? _("Error deleting remote-tracking branch '%s'") > >> - : _("Error deleting branch '%s'"), > >> - bname.buf); > >> - ret = 1; > >> - goto next; > > The code used to set the return code to 1 if it failed to delete a branch > > >> - } > >> - if (!quiet) { > >> - printf(remote_branch > >> - ? _("Deleted remote-tracking branch %s (was %s).\n") > >> - : _("Deleted branch %s (was %s).\n"), > >> - bname.buf, > >> - (flags & REF_ISBROKEN) ? "broken" > >> - : (flags & REF_ISSYMREF) ? target > >> - : find_unique_abbrev(&oid, DEFAULT_ABBREV)); > >> - } > >> - delete_branch_config(bname.buf); > >> + item = string_list_append(&refs_to_delete, name); > >> + item->util = xstrdup((flags & REF_ISBROKEN) ? "broken" > >> + : (flags & REF_ISSYMREF) ? target > >> + : find_unique_abbrev(&oid, DEFAULT_ABBREV)); > >> > >> next: > >> free(target); > >> } > >> > >> + delete_refs(NULL, &refs_to_delete, REF_NO_DEREF); > >> + > >> + for_each_string_list_item(item, &refs_to_delete) { > >> + char * describe_ref = item->util; > >> + char * name = item->string; > >> + if (ref_exists(name)) > >> + ret = 1; > > Now it sets the return code if the branch still exists after trying to > delete. I thought that was subtly different...but I tried doing a > branch deletion of a non-existent branch since I thought that would be > the only difference -- however, that errors out earlier in the > codepath before even getting to the stage of deleting refs. So I > think these are effectively the same. > > >> + else { > >> + char * refname = name + refname_pos; > >> + if (!quiet) > >> + printf(remote_branch > >> + ? _("Deleted remote-tracking branch %s (was %s).\n") > >> + : _("Deleted branch %s (was %s).\n"), > >> + name + refname_pos, describe_ref); > > Neither remote_branch nor refname_pos are changing throughout this > loop, which I at first thought was in error, but it looks like git > branch only allows you to delete one type or the other -- not a > mixture. So this is correct. > > >> + > >> + delete_branch_config(refname); > >> + } > >> + free(describe_ref); > >> + } > >> + string_list_clear(&refs_to_delete, 0); > >> + > >> free(name); > >> strbuf_release(&bname); > >> > >> diff --git a/builtin/tag.c b/builtin/tag.c > >> index e0a4c25382..0d11ffcd04 100644 > >> --- a/builtin/tag.c > >> +++ b/builtin/tag.c > >> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting, > >> } > >> > >> typedef int (*each_tag_name_fn)(const char *name, const char *ref, > >> - const struct object_id *oid, const void *cb_data); > >> + const struct object_id *oid, void *cb_data); > >> > >> static int for_each_tag_name(const char **argv, each_tag_name_fn fn, > >> - const void *cb_data) > >> + void *cb_data) > >> { > >> const char **p; > >> struct strbuf ref = STRBUF_INIT; > >> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn, > >> return had_error; > >> } > >> > >> -static int delete_tag(const char *name, const char *ref, > >> - const struct object_id *oid, const void *cb_data) > >> +static int collect_tags(const char *name, const char *ref, > >> + const struct object_id *oid, void *cb_data) > >> { > >> - if (delete_ref(NULL, ref, oid, 0)) > >> - return 1; > > This used to return 1 if it failed to delete a ref. > > >> - printf(_("Deleted tag '%s' (was %s)\n"), name, > >> - find_unique_abbrev(oid, DEFAULT_ABBREV)); > >> + struct string_list *ref_list = cb_data; > >> + > >> + string_list_append(ref_list, ref); > >> + ref_list->items[ref_list->nr - 1].util = oiddup(oid); > >> return 0; > > Now it unconditionally returns 0. > > >> } > >> > >> +static int delete_tags(const char **argv) > >> +{ > >> + int result; > >> + struct string_list refs_to_delete = STRING_LIST_INIT_DUP; > >> + struct string_list_item *item; > >> + > >> + result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete); > >> + delete_refs(NULL, &refs_to_delete, REF_NO_DEREF); > > You now only look at the result of collecting the tags, and ignore the > result of trying to delete them... > > >> + > >> + for_each_string_list_item(item, &refs_to_delete) { > >> + const char * name = item->string; > >> + struct object_id * oid = item->util; > >> + if (ref_exists(name)) > >> + result = 1; > > ...except that you check if the refs still exist afterward and set the > return code based on it. Like with the branch case, I can't come up > with a case where the difference matters. I suspect there's a race > condition there somewhere, but once you start going down that road I > think the old code may have had a bunch of races too. It might be > nice to document with a comment that there's a small race condition > with someone else trying to forcibly re-create the ref at the same > time you are trying to delete, but I don't think it's a big deal. > > If you did use the result of delete_refs(), you might have to double > check that the callers (git.c:handle_builtin() -> git.c:run_builtin() > -> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all > okay with the return code; it looks like handle_builtin() would pass > the return code to exit() and the git-tag manpage doesn't document the > return status, so you've at least got some leeway in terms of what > values are acceptable. Or you could just normalize the return value > of delete_refs() down to 0 or 1. But you'd only need to worry about > that if the race condition is something we're worried enough to > tackle. Interesting. I was worried about imposing a requirement on delete_refs that any non-zero return must mean that something was not deleted which should have been. Maybe that's not such a worry, though, and it would be acceptable to return a 1 even if all the refs were deleted even though some error occurred further down the line. I tried normalizing the value and then also verifying each ref was removed, but that seemed wrong. Maybe it's ok to just normalize it and not react to still-existing refs. > >> + else > >> + printf(_("Deleted tag '%s' (was %s)\n"), > >> + item->string + 10, > >> + find_unique_abbrev(oid, DEFAULT_ABBREV)); > >> + > >> + free(oid); > >> + } > >> + string_list_clear(&refs_to_delete, 0); > >> + return result; > >> +} > >> + > >> static int verify_tag(const char *name, const char *ref, > >> - const struct object_id *oid, const void *cb_data) > >> + const struct object_id *oid, void *cb_data) > >> { > >> int flags; > >> const struct ref_format *format = cb_data; > >> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix) > >> if (filter.merge_commit) > >> die(_("--merged and --no-merged options are only allowed in list mode")); > >> if (cmdmode == 'd') > >> - return for_each_tag_name(argv, delete_tag, NULL); > >> + return delete_tags(argv); > >> if (cmdmode == 'v') { > >> if (format.format && verify_ref_format(&format)) > >> usage_with_options(git_tag_usage, options); > >> -- > >> 2.23.0.rc1.174.g4cc1b04b4c > > Overall, I like the patch. Peff commented on v1 that the basic idea > (use the part of the refs API that batches operations) is the right > thing to do. I'm not that familiar with refs-touching code, but your > patch makes sense to me. I think I spotted a minor issue (you ignore > the return status of delete_refs(), then later check the existence of > the refs afterwards to determine success, which I believe is a minor > and unlikely race condition), but I'm not sure it's worth fixing; > perhaps just mark it with #leftoverbits and move on -- the faster > branch and tag deletion is a very nice improvement. > > I notice Martin said on v1 that there was a testcase that had problems > with your patch; I tested v2 and it looks like you fixed any such > issues. I think you also addressed the feedback from Junio, though > his comments about the return code and the minor race condition I > noticed around it might mean it'd be good to get his comments. > > Anyway, > Acked-by: Elijah Newren <newren@xxxxxxxxx> > > I would say Reviewed-by, but I'd like to get Junio's comments on the > return code and minor race. Thanks for the detailed review and thoughts.