Re: [PATCH 1/1] worktree: delete branches auto-created by 'worktree add'

Eric Sunshine <sunshine@xxxxxxxxxxxxx> · Fri, 27 Dec 2019 06:05:05 -0500

(Sorry for taking so long to review this patch; it ended up being a
quite lengthy review.)

On Sat, Dec 14, 2019 at 11:16 AM Pratyush Yadav <me@xxxxxxxxxxxxxxxxx> wrote:
> When no branch name is supplied to 'worktree add', it creates a new
> branch based on the name of the directory the new worktree is located
> in. But when the worktree is later removed, that created branch is left
> over.

This is describing the existing (intentional) behavior but doesn't
explain why this might be annoying or problematic. To help sell the
patch, it might make sense to say something about how the behavior can
trip up newcomers to git-worktree, leaving them to wonder why they are
accumulating so many branches that they weren't aware they created. A
comment about why you think "git worktree add -d foo" is not a viable
way to side-step the creation of unwanted branches might also be
worthwhile. For instance, you might say something about how newcomers
might not read the documentation thoroughly enough to know about
--detach or to understand what it means; indeed, some newcomers to Git
presumably have trouble with the concept of a detached HEAD and may
find it scary.

> Remove that branch when removing the worktree. To make sure no commits
> are lost, the branch won't be deleted if it has moved.

My knee-jerk reaction upon reading the first sentence of this
paragraph was that this is a significant and undesirable behavior
change, however, the second sentence helps to allay my fears about it.

It's possible, I suppose, that there is some existing tooling
somewhere which relies upon the current behavior, but it's hard to
imagine any good reason to do so. (That is, "git worktree add foo &&
git worktree remove foo" is just a glorified and expensive way to say
"git branch foo".) So, I don't look upon this change with disfavor; it
could well be beneficial for newcomers, and perhaps a nice convenience
in general.

However, there is a rather serious flaw in the implementation. My
expectation is that it should only automatically delete a branch if
the branch creation was inferred; it should never automatically delete
a branch which was created explicitly. You kind of have this covered
(and even have a test for it), but it doesn't work correctly when the
user explicitly requests branch creation via -b/-B and the branch name
matches the worktree name. For instance:

    git worktree add -b foo foo
    git worktree remove foo

incorrectly automatically removes branch "foo" even though the user
requested its creation explicitly.

Another big question: Should an automatically-created branch be
deleted automatically when a worktree is pruned? That is, although
this sequence will remove an automatically-created branch:

    git worktree add foo
    git worktree remove foo

the current patch will not clean up the branch given this sequence:

    git worktree add foo
    rm -rf foo
    git worktree prune

Either way, it might be worthwhile to update the documentation to mention this.

> An example use case of when something like this is useful is when the
> user wants to check out a separate worktree to run and test on an older
> version, but don't want to touch the current worktree. So, they create a
> worktree, run some tests, and then remove it. But this leaves behind a
> branch the user never created in the first place.

The last sentence isn't exactly accurate. The user _did_ create the
branch. It would be more accurate to say "...the user did not
necessarily _intentionally_ create..." or something like that.

> So, remove the branch if nothing was done on it.

By the way, the ordering of the commit message paragraphs is a bit
off; it somewhat tries to justifies the change before explaining what
the problem is. I'd suggest this order:

    - describe current behavior
    - explain why current behavior can be undesirable in some circumstances;
      cite your example use-case here, perhaps
    - describe how this patch improves the situation

The two paragraphs which talk about "remove the branch" are just
repeating one another. I would drop one of them and keep the other as
the final bullet point of the suggested commit message order.

> Signed-off-by: Pratyush Yadav <me@xxxxxxxxxxxxxxxxx>
> ---
> diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
> @@ -73,8 +73,9 @@ If `<commit-ish>` is omitted and neither `-b` nor `-B` nor `--detach` used,
>  doesn't exist, a new branch based on HEAD is automatically created as
> -if `-b <branch>` was given.  If `<branch>` does exist, it will be
> -checked out in the new worktree, if it's not checked out anywhere
> +if `-b <branch>` was given.  In this case, if `<branch>` is not moved, it is
> +automatically deleted when the worktree is removed.  If `<branch>` does exist,
> +it will be checked out in the new worktree, if it's not checked out anywhere

I found it confusing to find automatic branch deletion described here
under the "worktree add" command...

> @@ -108,6 +109,10 @@ Remove a working tree. Only clean working trees (no untracked files
> +Removing a working tree might lead to its associated branch being deleted if
> +it was auto-created and has not moved since. See `add` for more information on
> +when exactly this can happen.

Subjectively, it seems more natural to fully discuss automatic branch
removal here rather than referring to the discussion of "worktree
add".

> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -35,6 +35,7 @@ struct add_opts {
> +static int auto_create;

I think this variable belongs in the 'add_opts' structure rather than
being file-global.

> @@ -270,11 +271,13 @@ static int add_worktree(const char *path, const char *refname,
> -       int len, ret;
> +       int len, ret, fd;
> +       struct object_id oid;
> +       char *hex;

Rather than declaring 'fd', 'oid', and 'hex' here, how about declaring
them in the scope of the "if (auto_create) {" conditional below, which
is the only place they are used?

> @@ -353,6 +356,18 @@ static int add_worktree(const char *path, const char *refname,
> +       strbuf_reset(&sb);
> +       strbuf_addf(&sb, "%s/auto_created", sb_repo.buf);

Why aren't these two lines inside the "if (auto_create) {" conditional
below? They seem to be used only for that case.

I think this new worktree metadata file warrants a documentation
update. In particular, gitrepository-layout.txt talks about
worktree-specific metadata files, and the "Details" section of
git-worktree.txt may need an update.

A bit of bikeshedding regarding the filename: "auto_created" is rather
unusual. Most names in the .git hierarchy are short and sweet. Also,
with the exception of ORIG_HEAD and FETCH_HEAD, all other multi-word
filenames seem to use hyphen rather than underscore, which suggests
"auto-created" would be a better choice. However, I'd probably drop
the hyphen altogether. Finally, "auto_created", alone, does not
necessarily convey that the branch was auto-created; someone could
misinterpret it as meaning the worktree itself was auto-created, so I
wonder if a better name can be found.

A bigger question, though, is whether we really want to see new files
like this springing up in the .git/worktrees/<id>/ directory for each
new piece of metadata which belongs to a worktree. I ask because this
isn't the first such case in which some additional worktree-specific
metadata was proposed (see, for instance, [1]). So, I'm wondering if
we should have a more generalized solution, such as introducing a new
file which can hold any sort of metadata which comes along in the
future. In particular, I'm thinking about a file containing an
extensible set of "key: value" tuples, in which case the "auto
created" metadata would be just one of possibly many keys. For
instance:

    branch-auto-created-at: deadbeef

The above is a genuine question. I'm not demanding that this patch
implement it, but I think it deserves discussion and thought before
making a decision.

[1]: http://public-inbox.org/git/CAPig+cRGMEjVbJZKXOskN6=5zchisx7UuwW9ZKGwoq5GQZQ_rw@xxxxxxxxxxxxxx/

> +       /* Mark this branch as an "auto-created" one. */

This comment doesn't really say anything which the code itself isn't
already saying (especially if you move the strbuf_addf() call inside
the conditional), so the comment could be dropped.

> +       if (auto_create) {
> +               fd = xopen(sb.buf, O_WRONLY | O_CREAT | O_TRUNC, 0666);
> +               get_oid("HEAD", &oid);

Unless I'm mistaken, this is just wrong. You're grabbing the OID of
HEAD from the worktree in which "worktree add" is being invoked,
however, if the new branch name is DWIM'd from an existing
tracking-branch, then the OID should be that of the tracking-branch,
not HEAD of the current worktree. So, you should be using the OID
already looked up earlier in the function, 'commit->object.oid', which
should be correct for either case.

> +               hex = oid_to_hex(&oid);
> +               write_file_buf(sb.buf, hex, strlen(hex));
> +
> +               if (close(fd))
> +                       die(_("could not close '%s'"), sb.buf);
> +       }

Is there a reason you're creating the file in this rather manual
fashion rather than using write_file() as is already heavily used in
this code for creating all the other files residing in
.git/worktrees/<id>/?

This code is correctly sandwiched within the "is_junk" scope, which
means that "auto_created" will be cleaned up automatically, along with
other .git/worktrees/<id>/ files, if "worktree add" fails for some
reason. Good.

>         argv_array_pushf(&child_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
>         argv_array_pushf(&child_env, "%s=%s", GIT_WORK_TREE_ENVIRONMENT, path);
> @@ -576,6 +591,8 @@ static int add(int ac, const char **av, const char *prefix)
>                 if (run_command(&cp))
>                         return -1;
>                 branch = new_branch;
> +
> +               auto_create = 1;

Drop the unnecessary blank line.

By the way, this suffers from the problem that if "git worktree add
foo" fails for some reason, such as because path "foo" already exists,
then the new branch will _not_ be cleaned up automatically since that
failure will happen before "auto_created" is ever created (among other
reasons). But that's not a new issue; it's an existing flaw of
"worktree add" not cleaning up a branch it created before it discovers
that it can't actually create the target directory for some reason, so
I wouldn't expect you to fix that problem with this submission. (I'm
just mentioning it for completeness.)

> @@ -912,9 +929,10 @@ static int remove_worktree(int ac, const char **av, const char *prefix)
>                 OPT_END()
>         };
> -       struct strbuf errmsg = STRBUF_INIT;
> +       struct strbuf errmsg = STRBUF_INIT, sb = STRBUF_INIT, hex = STRBUF_INIT;
> -       int ret = 0;
> +       int ret = 0, delete_auto_created = 0;
> +       struct object_id oid;

Perhaps move the declarations of 'hex' and 'oid' into the scope where
they are used rather than making them global to the function.

> @@ -939,6 +957,23 @@ static int remove_worktree(int ac, const char **av, const char *prefix)
> +       /*
> +        * Check if we auto-created a branch for this worktree and it hasn't
> +        * moved since. Do it before the contents of the worktree get wiped.
> +        * Delete the branch later because it is checked out right now.
> +        */

Good useful comment.

> +       git_path_buf(&sb, "worktrees/%s/auto_created", wt->id);
> +       if (file_exists(sb.buf)) {
> +               strbuf_read_file(&hex, sb.buf, 0);

You can avoid an unnecessary race condition here by dropping the
file_exists() call altogether and just checking the return code of
strbuf_read_file() -- which you should probably be doing anyhow. If
strbuf_read_file() returns a non-negative value, then you know it
exists, so file_exists() is redundant.

> +               get_oid(wt->id, &oid);
> +

Drop the unnecessary blank line.

> +               if (strcmp(hex.buf, oid_to_hex(&oid)) == 0)
> +                       delete_auto_created = 1;

I was wondering if it would be more semantically correct to parse
'hex' into an 'oid' and compare them with oidcmp() rather than doing a
string comparison of the hex values (though I'm not sure it will
matter in practice).

> +       }
> +
> +       strbuf_release(&sb);
> +       strbuf_release(&hex);

Drop the unnecessary blank line.

> @@ -952,6 +987,17 @@ static int remove_worktree(int ac, const char **av, const char *prefix)
> +       if (delete_auto_created) {
> +               struct child_process cp = CHILD_PROCESS_INIT;
> +               cp.git_cmd = 1;
> +
> +               argv_array_push(&cp.args, "branch");
> +               argv_array_push(&cp.args, "-d");
> +               argv_array_push(&cp.args, wt->id);
> +
> +               ret |= run_command(&cp);
> +       }

Alternately:

    argv_array_pushl(&cp.args, "branch", "-d", wt->id, NULL);

However, I don't think it is correct to use 'wt->id' here as the
branch name since there is no guarantee that the <id> in
.git/worktrees/<id>/ matches the branch name with which the worktree
was created. For instance:

    git worktree add foo/bar existing-branch
    git worktree add baz/bar

will, due to name conflicts, create worktree metadata directories:

    .git/worktrees/bar
    .git/worktrees/bar1

where the first is associated with branch "existing-branch", and the
second is associated with new branch "bar". When you then invoke "git
worktree remove baz/bar", it will try removing a branch named "bar1",
not "bar" as intended. To fix this, I think you need to record the
original auto-created branch name in the "auto_created" metadata file
too, not just the OID.

Finally, make this code consistent with other existing similar code in
this file by dropping the unnecessary blank lines in this hunk.

> diff --git a/t/t2403-worktree-move.sh b/t/t2403-worktree-move.sh
> @@ -222,4 +222,49 @@ test_expect_success 'not remove a repo with initialized submodule' '
> +test_expect_success 'remove auto-created branch' '
> +       (
> +               git worktree add to-remove &&
> +               git worktree remove to-remove &&
> +               git branch -l to-remove >branch_list &&
> +               test_line_count = 0 branch_list
> +       )
> +'

I don't think there is any need for this test to be run in a subshell,
so you can drop the enclosing '(' and ')'.

I worry about using porcelain git-branch to check whether the branch
has actually been removed. Using git-rev-parse would likely be a more
direct and safe way to test it. For instance:

    git worktree add to-remove &&
    git worktree remove to-remove &&
    test_must_fail git rev-parse --verify -q to-remove

should be sufficient, I think. And, to be really thorough, you might say:

    test_might_fail git branch -D to-remove &&
    git worktree add to-remove &&
    git rev-parse --verify -q to-remove &&
    git worktree remove to-remove &&
    test_must_fail git rev-parse --verify -q to-remove

The above comments apply to the other new tests added by this patch, as well.

> +test_expect_success 'do not remove a branch that was not auto-created' '
> +       (
> +               git worktree add -b new_branch to-remove &&

Nit: The inconsistent mix of underscore and hyphen in names is odd.
Perhaps settle on one or the other (with a slight preference toward
hyphen).

> +               git worktree remove to-remove &&
> +               git branch -l new_branch >branch_list &&
> +               test_line_count = 1 branch_list &&

As noted earlier, although this particular case of a branch created
explicitly with -b works as expected by not deleting the branch, the
similar case:

    git worktree add -b to-remove to-remove &&

will incorrectly automatically delete the branch.

> +               git branch -d new_branch &&
> +               git branch foo &&
> +               git worktree add to-remove foo &&
> +               git worktree remove to-remove &&
> +               git branch -l foo >branch_list &&
> +               test_line_count = 1 branch_list &&
> +               git branch -d foo &&
> +               git branch to-remove &&
> +               git worktree add to-remove &&
> +               git worktree remove to-remove &&
> +               git branch -l to-remove >branch_list &&
> +               test_line_count = 1 branch_list &&
> +               git branch -d to-remove

If any code above this "git branch -d" fails, then it will never get
this far, thus won't remove "to-remove". To perform cleanup whether
the test succeeds or fails, you should use test_when_finished()
_early_ in the test:

    test_when_finished "git branch -d to-remove || :" &&

However, if you restructure the tests as suggested above, then you
might be able to get away without bothering with this cleanup.

> +       )
> +'

This test is checking three distinct cases of explicitly-created
branches. It would make it easier to debug a failing case if you split
it up into three tests -- one for each case.

> +test_expect_success 'do not remove auto-created branch that was moved' '
> +       (
> +               git worktree add to-remove &&
> +               cd to-remove &&
> +               test_commit foo &&
> +               cd ../ &&

We normally avoid cd'ing around in tests like this because it can
cause tests following this one to run in the wrong directory if
something above the "cd ../" fails. In this particular case, it
doesn't matter since the entire body of this test is within a
subshell.

However, if you take advantage of test_commits()'s -C argument, then
you can ditch the cd's and the subshell altogether:

    test_commit -C to-remove foo &&

> +               git worktree remove to-remove &&
> +               git branch -l to-remove >branch_list &&
> +               test_line_count = 1 branch_list &&
> +               git branch -D to-remove
> +       )
> +'