Re: [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/1/2022 10:17 AM, Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx> writes:
>
>> Before this patch, whenever --sparse is used, `git-grep` utilizes the
>> ensure_full_index() method to expand the index and search all the
>> entries. Because this method requires walking all the trees and
>> constructing the index, it is the slow part within the whole command.
>>
>> To achieve better performance, this patch uses grep_tree() to search the
>> sparse directory entries and get rid of the ensure_full_index() method.
>
> When you encounter a "sparsedir" (i.e. a tree recorded in index),
> you should know the path leading to that directory. Even though I no
> longer remember the details of the implementations of grep_$where()
> which I did long time ago, I think grep_tree() should know how to
> pass the leading path down, as that is the most natural way to
> implement the recursive behaviour.  This patch should be able to
> piggyback on that.

Yes, though this commit [1] from 6 years ago started to assume that
grep_tree() only accepts root tree or commit, so the function fails
to process a tree like "sparsedir". It's the pathspec matching base that
was messed up. The support for a tree that is not at root-level was
added in this series.

[1] 74ed43711fd1cd7ce155d338f87ebe52cb74d9e2

>> @@ -537,8 +534,26 @@ static int grep_cache(struct grep_opt *opt,
>>
>>          strbuf_setlen(&name, name_base_len);
>>          strbuf_addstr(&name, ce->name);
>> +        if (S_ISSPARSEDIR(ce->ce_mode)) {
>> +            enum object_type type;
>> +            struct tree_desc tree;
>> +            void *data;
>> +            unsigned long size;
>> +            struct strbuf base = STRBUF_INIT;
>> +
>> +            strbuf_addstr(&base, ce->name);
>> +
>> +            data = read_object_file(&ce->oid, &type, &size);
>> +            init_tree_desc(&tree, data, size);
>>
>> +            /*
>> +             * sneak in the ce_mode using check_attr parameter
>> +             */
>> +            hit |= grep_tree(opt, pathspec, &tree, &base,
>> +                     base.len, ce->ce_mode);
>
> OK.  Instead of inventing a new "base" strbuf, we could reuse
> existing name while running the grep_tree() and restore it after it
> returns, and I suspect that the end result would be more in line
> with how grep_cache() uses that "name" buffer for all the cache
> entries.  But that is not a correctness issue (it is move about
> preventing from making the code worse).

Oh right, thanks for the suggestion!

>> @@ -598,7 +613,14 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>>          int te_len = tree_entry_len(&entry);
>>
>>          if (match != all_entries_interesting) {
>> -            strbuf_addstr(&name, base->buf + tn_len);
>> +            if (S_ISSPARSEDIR(check_attr)) {
>> +                // object is a sparse directory entry
>
> No // comments, please.

OK.

Thanks,
Shaoxuan






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux