Re: [PATCH v6 1/2] worktree: add 'for_each_worktree' function

Mike Rappazzo <rappazzo@xxxxxxxxx> · Mon, 31 Aug 2015 14:57:09 -0400

On Mon, Aug 31, 2015 at 1:11 AM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> Thanks for working on this. I apologize for not reviewing earlier
> rounds (other than v2 [1]); it's been difficult to find a block of
> time to do so...

I appreciate your time and comments.

>
> On Sun, Aug 30, 2015 at 3:10 PM, Michael Rappazzo <rappazzo@xxxxxxxxx> wrote:
>> for_each_worktree iterates through each worktree and invokes a callback
>> function.  The main worktree (if not bare) is always encountered first,
>> followed by worktrees created by `git worktree add`.
>
> Why does this iteration function specially filter out a bare main
> worktree? If it instead unconditionally vends the main worktree (bare
> or not), then the caller can make its own decision about what to do
> with it, thus empowering the caller, rather than imposing a (possibly)
> arbitrary restriction upon it.
>
> For instance, the "git worktree list" command may very well want to
> show the main worktree, even if bare (possibly controlled by a
> command-line option), annotated appropriately ("[bare]"). This may be
> exactly the sort of information a user wants to know, and by leaving
> the decision up to the caller, then the caller ("git worktree list" in
> this example) has the opportunity to act accordingly, whereas if
> for_each_worktree() filters out a bare main worktree unconditionally,
> then the caller ("git worktree list") will never be able to offer such
> an option.

I wasn't sure that a bare repo would be considered a worktree.  I
don't think that it would be
a good idea to include it.  In the same vein that I can't checkout a
branch in a bare repo, it
figure that it shouldn't be in the list.

>
>> If the callback function returns a non-zero value, iteration stops, and
>> the callback's return value is returned from the for_each_worktree
>> function.
>
> Stepping back a bit, is a for-each-foo()-style interface desirable?
> This sort of interface imposes a good deal of complexity on callers,
> demanding a callback function and callback data (cb_data), and is
> generally (at least in C) more difficult to reason about than other
> simpler interfaces. Is such complexity warranted?
>
> An alternate, much simpler interface would be to have a function, say
> get_worktrees(), return an array of 'worktree' structures to the
> caller, which the caller would iterate over (which is a common
> operation in C, thus easily reasoned about).
>
> The one benefit of a for-each-foo()-style interface is that it's
> possible to "exit early", thus avoiding the cost of interrogating
> meta-data for worktrees in which the caller is not interested,
> however, it seems unlikely that there will be so many worktrees linked
> to a repository for this early exit to translate into any real
> savings.

I am not opposed to making a simple function as you describe.  I think David was
looking for a callback style function.  I don't think it would be
terrible to keep the
callback and then also include the simple function to return the
struct array.  I like
the memory management of the callback better than the struct array though.

>
>> Signed-off-by: Michael Rappazzo <rappazzo@xxxxxxxxx>
>> ---
>> diff --git a/builtin/worktree.c b/builtin/worktree.c
>> index 430b51e..7b3cb96 100644
>> --- a/builtin/worktree.c
>> +++ b/builtin/worktree.c
>> @@ -26,6 +26,14 @@ static int show_only;
>>  static int verbose;
>>  static unsigned long expire;
>>
>> +/*
>> + * The signature for the callback function for the for_each_worktree()
>> + * function below.  The memory pointed to by the callback arguments
>> + * is only guaranteed to be valid for the duration of a single
>> + * callback invocation.
>> + */
>> +typedef int each_worktree_fn(const char *path, const char *git_dir, void *cb_data);
>
> In my review[1] of v2, I mentioned that (at some point) the "git
> worktree list" command might like to show additional information about
> the worktree other than just its location. Such information might
> include its tag, the checked out branch (or detached HEAD), whether
> it's locked (and the lock reason and whether the worktree is currently
> "online"), whether it can be pruned (and the prune reason).
>
> Commands such as "git worktree add" and "git checkout" need to know if
> a branch is already checked out in some (other) worktree, thus they
> also need the "branch" information.
>
> This need by clients for additional worktree meta-data suggests that
> the additional information ought to be encapsulated into a 'struct
> worktree', and that for_each_worktree() should vend a 'struct
> worktree' rather than vending merely the "git dir". (Alternately, the
> above-suggested get_worktrees() should return an array of 'struct
> worktree' items.)
>
> An important reason for making for_each_worktree() vend a 'struct
> worktree' is that it facilitates centralizing all the logic for
> determining and computing the extra worktree meta-data in one place,
> thus relieving callers of burden of having to compute the extra
> information themselves. (Junio alluded to this in his v5 review[2].)

I think that this makes sense.  I will refactor to use a struct, and
include that in
the callback signature and/or the struct array function.

>
>>  static int prune_worktree(const char *id, struct strbuf *reason)
>>  {
>>         struct stat st;
>> @@ -359,6 +367,81 @@ static int add(int ac, const char **av, const char *prefix)
>>         return add_worktree(path, branch, &opts);
>>  }
>>
>> +/*
>> + * Iterate through each worktree and invoke the callback function.  If
>> + * the callback function returns non-zero, the iteration stops, and
>> + * this function returns that return value
>> + */
>> +static int for_each_worktree(each_worktree_fn fn, void *cb_data)
>
> Ultimately, code outside of builtin/worktree.c will want to benefit
> from the encapsulation of this worktree iteration logic. For instance,
> the support code in (top-level) branch.c invoked by "git worktree add"
> and "git checkout" to determine if a branch is already checked out in
> some (other) branch could avail itself of this iteration interface.
>
> As such, it would make sense to place this code at location where it
> can be accessed by callers other than just builtin/worktree.c. A good
> location for it to reside might be in a new top-level file worktree.c.
>
> It doesn't have to be part of the current patch series, but,
> eventually, the code in branch.c for determining if a branch is
> already checked out elsewhere also should migrate to the new top-level
> worktree.c; in particular, die_if_checked_out(), find_shared_symref(),
> and find_linked_symref().
>
>> +{
>> +       struct strbuf worktree_path = STRBUF_INIT;
>> +       struct strbuf worktree_git = STRBUF_INIT;
>> +       const char *common_dir;
>> +       int main_is_bare = 0;
>> +       int ret = 0;
>> +
>> +       common_dir = get_git_common_dir();
>> +       if (!strcmp(common_dir, get_git_dir())) {
>> +               /* simple case - this is the main repo */
>> +               main_is_bare = is_bare_repository();
>> +               if (!main_is_bare) {
>> +                       strbuf_addstr(&worktree_path, get_git_work_tree());
>> +               } else {
>> +                       strbuf_addstr(&worktree_path, common_dir);
>> +               }
>> +       } else {
>> +               strbuf_addstr(&worktree_path, common_dir);
>> +               /* If the common_dir DOES NOT end with '/.git', then it is bare */
>> +               main_is_bare = !strbuf_strip_suffix(&worktree_path, "/.git");
>> +       }
>> +       strbuf_addstr(&worktree_git, worktree_path.buf);
>
> I may have missed some discussion in earlier rounds (or perhaps I'm
> too simple-minded), but I'm confused about why this logic (and most of
> the rest of the function) differs so much from existing logic in
> branch.c:find_shared_symref() and find_linked_symref() for iterating
> over the worktrees and gleaning information about them. That logic in
> branch.c seems to do a pretty good job of reporting the worktree in
> which a branch is already checked out, so it's not clear why the above
> logic takes a different (and seemingly more complex) approach.

This is due to my unfamiliarity of the code api.  I keep trying to
look for the right
functions to use, but I miss them.  Sorry.  I will rework using those functions.

>
>> +       if (!main_is_bare) {
>> +               strbuf_addstr(&worktree_git, "/.git");
>> +
>> +               ret = fn(worktree_path.buf, worktree_git.buf, cb_data);
>> +               if (ret)
>> +                       goto done;
>> +       }
>> +       strbuf_addstr(&worktree_git, "/worktrees");
>> +
>> +       if (is_directory(worktree_git.buf)) {
>
> As mentioned in my v2 review[1], this is_directory() invocation
> doesn't buy you anything. The following opendir() will either succeed
> or fail anyhow, so checking beforehand if 'worktree_git' is a
> directory is wasted work. If you eliminate is_directory(), you avoid
> that unnecessary work (and the rest of the code can be less deeply
> indented).

For some reason, in my testing, calling opendir was giving a
segmentation fault regardless of how I checked the return value.
Including the is_directory check allowed me to avoid that.

>
>> +               DIR *dir = opendir(worktree_git.buf);
>> +               if (dir) {
>> +                       struct stat st;
>> +                       struct dirent *d;
>> +                       size_t base_path_len = worktree_git.len;
>> +
>> +                       while ((d = readdir(dir)) != NULL) {
>> +                               if (!strcmp(d->d_name, ".") || !strcmp(d->d_name, ".."))
>> +                                       continue;
>> +
>> +                               strbuf_setlen(&worktree_git, base_path_len);
>> +                               strbuf_addf(&worktree_git, "/%s/gitdir", d->d_name);
>> +
>> +                               if (stat(worktree_git.buf, &st)) {
>> +                                       fprintf(stderr, "Unable to read worktree git dir: %s\n", worktree_git.buf);
>> +                                       continue;
>> +                               }
>> +
>> +                               strbuf_reset(&worktree_path);
>> +                               strbuf_read_file(&worktree_path, worktree_git.buf, st.st_size);
>> +                               strbuf_strip_suffix(&worktree_path, "/.git\n");
>> +
>> +                               strbuf_strip_suffix(&worktree_git, "/gitdir");
>> +                               ret = fn(worktree_path.buf, worktree_git.buf, cb_data);
>> +                               if (ret)
>> +                                       break;
>> +                       }
>> +               }
>> +               closedir(dir);
>
> This closedir() should be inside the `if (dir) {...}' block rather than outside.
>
> Returning to the issue of this code differing from existing code in
> branch.c for iterating worktrees, I'm curious as to why this
> functionality was implemented again from scratch here rather than
> re-using the existing (somewhat) well proven code and logic in
> branch.c:find_shared_symref() and find_linked_symref(). My v2
> review[1] hinted at re-use a couple times.
>
> Considering that the existing code in branch.c for iterating worktrees
> has gone through several code reviews and is already (somewhat)
> proven, re-using that code for a general purpose worktree iterator
> makes lots of sense compared with writing it again from scratch, as is
> done here. Consequently, I, personally, would be happier to see the
> existing code refactored (perhaps over several patches) to the point
> that it can be re-used as a general purpose iterator. But, that's my
> opinion; I can't speak for others.
>
>> +       }
>> +done:
>> +       strbuf_release(&worktree_git);
>> +       strbuf_release(&worktree_path);
>> +       return ret;
>> +}
>
> [1]: http://article.gmane.org/gmane.comp.version-control.git/275528
> [2]: http://article.gmane.org/gmane.comp.version-control.git/276471
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html