Re: [PATCH 3/3] clean: improve performance when removing lots of directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 7, 2015 at 12:10 AM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> On Mon, Apr 6, 2015 at 7:48 AM, Erik Elfström <erik.elfstrom@xxxxxxxxx> wrote:
>> Before this change, clean used resolve_gitlink_ref to check for the
>> presence of nested git repositories. This had the drawback of creating
>> a ref_cache entry for every directory that should potentially be
>> cleaned. The linear search through the ref_cache list caused a massive
>> performance hit for large number of directories.
>>
>> Teach clean.c:remove_dirs to use setup.c:is_git_directory
>> instead. is_git_directory will actually open HEAD and parse the HEAD
>> ref but this implies a nested git repository and should be rare when
>> cleaning.
>>
>> Using is_git_directory should give a more standardized check for what
>> is and what isn't a git repository but also gives a slight behavioral
>> change. We will now detect and respect bare and empty nested git
>> repositories (only init run). Update t7300 to reflect this.
>>
>> The time to clean an untracked directory containing 100000 sub
>> directories went from 61s to 1.7s after this change.
>
> Impressive.
>
>> Signed-off-by: Erik Elfström <erik.elfstrom@xxxxxxxxx>
>> Helped-by: Jeff King <peff@xxxxxxxx>
>
> It is customary for your sign-off to be last.
>
> More below...
>
>> ---
>> diff --git a/builtin/clean.c b/builtin/clean.c
>> index 98c103f..e951bd9 100644
>> --- a/builtin/clean.c
>> +++ b/builtin/clean.c
>> @@ -148,6 +147,24 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
>>         return 0;
>>  }
>>
>> +static int is_git_repository(struct strbuf *path)
>> +{
>> +       int ret = 0;
>> +       if (is_git_directory(path->buf))
>> +               ret = 1;
>> +       else {
>> +               int orig_path_len = path->len;
>> +               if (path->buf[orig_path_len - 1] != '/')
>
> Minor: I don't know how others feel about it, but I always find it a
> bit disturbing to see a potential negative array access without a
> safety check that orig_path_len is not 0, either directly in the
> conditional or as a documenting assert().
>


I think I would prefer to accept empty input and return false rather
than assert. What to you think about:

static int is_git_repository(struct strbuf *path)
{
    int ret = 0;
    size_t orig_path_len = path->len;
    if (orig_path_len == 0)
        ret = 0;
    else if (is_git_directory(path->buf))
        ret = 1;
    else {
        if (path->buf[orig_path_len - 1] != '/')
            strbuf_addch(path, '/');
        strbuf_addstr(path, ".git");
        if (is_git_directory(path->buf))
            ret = 1;
        strbuf_setlen(path, orig_path_len);
    }

    return ret;
}


Also I borrowed this pattern from remove_dirs and it has the same
problem. Should I add something like this as a separate commit?

diff --git a/builtin/clean.c b/builtin/clean.c
index ccffd8a..88850e3 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -173,7 +173,8 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
        DIR *dir;
        struct strbuf quoted = STRBUF_INIT;
        struct dirent *e;
-       int res = 0, ret = 0, gone = 1, original_len = path->len, len;
+       int res = 0, ret = 0, gone = 1;
+       size_t original_len = path->len, len;
        struct string_list dels = STRING_LIST_INIT_DUP;

        *dir_gone = 1;
@@ -201,6 +202,7 @@ static int remove_dirs(struct strbuf *path, const
char *prefix, int force_flag,
                return res;
        }

+       assert(original_len > 0 && "expects non-empty path");
        if (path->buf[original_len - 1] != '/')
                strbuf_addch(path, '/');


>> +                       strbuf_addch(path, '/');
>> +               strbuf_addstr(path, ".git");
>> +               if (is_git_directory(path->buf))
>> +                       ret = 1;
>> +               strbuf_setlen(path, orig_path_len);
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>>  static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
>>                 int dry_run, int quiet, int *dir_gone)
>>  {
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]