Re: [PATCH 3/3] clean: improve performance when removing lots of directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 6, 2015 at 7:48 AM, Erik Elfström <erik.elfstrom@xxxxxxxxx> wrote:
> Before this change, clean used resolve_gitlink_ref to check for the
> presence of nested git repositories. This had the drawback of creating
> a ref_cache entry for every directory that should potentially be
> cleaned. The linear search through the ref_cache list caused a massive
> performance hit for large number of directories.
>
> Teach clean.c:remove_dirs to use setup.c:is_git_directory
> instead. is_git_directory will actually open HEAD and parse the HEAD
> ref but this implies a nested git repository and should be rare when
> cleaning.
>
> Using is_git_directory should give a more standardized check for what
> is and what isn't a git repository but also gives a slight behavioral
> change. We will now detect and respect bare and empty nested git
> repositories (only init run). Update t7300 to reflect this.
>
> The time to clean an untracked directory containing 100000 sub
> directories went from 61s to 1.7s after this change.

Impressive.

> Signed-off-by: Erik Elfström <erik.elfstrom@xxxxxxxxx>
> Helped-by: Jeff King <peff@xxxxxxxx>

It is customary for your sign-off to be last.

More below...

> ---
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 98c103f..e951bd9 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
>         return 0;
>  }
>
> +static int is_git_repository(struct strbuf *path)
> +{
> +       int ret = 0;
> +       if (is_git_directory(path->buf))
> +               ret = 1;
> +       else {
> +               int orig_path_len = path->len;
> +               if (path->buf[orig_path_len - 1] != '/')

Minor: I don't know how others feel about it, but I always find it a
bit disturbing to see a potential negative array access without a
safety check that orig_path_len is not 0, either directly in the
conditional or as a documenting assert().

> +                       strbuf_addch(path, '/');
> +               strbuf_addstr(path, ".git");
> +               if (is_git_directory(path->buf))
> +                       ret = 1;
> +               strbuf_setlen(path, orig_path_len);
> +       }
> +
> +       return ret;
> +}
> +
>  static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
>                 int dry_run, int quiet, int *dir_gone)
>  {
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]