Re: [PATCH v5 2/4] ls-files: optionally recurse into submodules

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 28 Sep 2016 15:11:23 -0700

On Wed, Sep 28, 2016 at 2:50 PM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
> Allow ls-files to recognize submodules in order to retrieve a list of
> files from a repository's submodules.  This is done by forking off a
> process to recursively call ls-files on all submodules. Use top-level
> --super-prefix option to pass a path to the submodule which it can
> use to prepend to output or pathspec matching logic.
>
> Signed-off-by: Brandon Williams <bmwill@xxxxxxxxxx>
> ---
>  Documentation/git-ls-files.txt         |   7 +-
>  builtin/ls-files.c                     | 139 ++++++++++++++++++++++++---------
>  git.c                                  |   2 +-
>  t/t3007-ls-files-recurse-submodules.sh | 100 ++++++++++++++++++++++++
>  4 files changed, 208 insertions(+), 40 deletions(-)
>  create mode 100755 t/t3007-ls-files-recurse-submodules.sh
>
> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index 0d933ac..446209e 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -18,7 +18,8 @@ SYNOPSIS
>                 [--exclude-per-directory=<file>]
>                 [--exclude-standard]
>                 [--error-unmatch] [--with-tree=<tree-ish>]
> -               [--full-name] [--abbrev] [--] [<file>...]
> +               [--full-name] [--recurse-submodules]
> +               [--abbrev] [--] [<file>...]
>
>  DESCRIPTION
>  -----------
> @@ -137,6 +138,10 @@ a space) at the start of each line:
>         option forces paths to be output relative to the project
>         top directory.
>
> +--recurse-submodules::
> +       Recursively calls ls-files on each submodule in the repository.
> +       Currently there is only support for the --cached mode.
> +
>  --abbrev[=<n>]::
>         Instead of showing the full 40-byte hexadecimal object
>         lines, show only a partial prefix.
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index 00ea91a..e0e5cf5 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -14,6 +14,7 @@
>  #include "resolve-undo.h"
>  #include "string-list.h"
>  #include "pathspec.h"
> +#include "run-command.h"
>
>  static int abbrev;
>  static int show_deleted;
> @@ -28,8 +29,10 @@ static int show_valid_bit;
>  static int line_terminator = '\n';
>  static int debug_mode;
>  static int show_eol;
> +static int recurse_submodules;
>
>  static const char *prefix;
> +static const char *super_prefix;
>  static int max_prefix_len;
>  static int prefix_len;
>  static struct pathspec pathspec;
> @@ -68,6 +71,19 @@ static void write_eolinfo(const struct cache_entry *ce, const char *path)
>  static void write_name(const char *name)
>  {
>         /*
> +        * Prepend the super_prefix to name to construct the full_name to be
> +        * written.  'full_name' gets reused across output lines to minimize the
> +        * allocation churn.
> +        */

When doing these tricks with the allocation churn (i.e. we make it
hard to read and understand
for a reader, then we should do it completely, i.e. keep full_name in
the strbuf and
only do a strbuf_setlen to reset the buffer just a bit. With this
implementation we burden the
reader/user to understand how the memory is kept over multiple calls
to this function,
but we still do more work than expected). So either I'd not worry
about performance
and provide an 'obvious correct' implementation, with e.g. no static
here and we free the memory
correctly. Or you'd go the performance route, but then we'd usually
ask for numbers.
(How much faster is it; Does the trickyness trade off well to the
performance gain?)

> +       static struct strbuf full_name = STRBUF_INIT;
> +       if (super_prefix && *super_prefix) {

Why do we have to check twice here? Wouldn't just

    if (super_prefix) {
        ...

be enough?