On 9/1/2022 10:17 AM, Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx> writes:
>
>> Before this patch, whenever --sparse is used, `git-grep` utilizes the
>> ensure_full_index() method to expand the index and search all the
>> entries. Because this method requires walking all the trees and
>> constructing the index, it is the slow part within the whole command.
>>
>> To achieve better performance, this patch uses grep_tree() to search the
>> sparse directory entries and get rid of the ensure_full_index() method.
>
> When you encounter a "sparsedir" (i.e. a tree recorded in index),
> you should know the path leading to that directory. Even though I no
> longer remember the details of the implementations of grep_$where()
> which I did long time ago, I think grep_tree() should know how to
> pass the leading path down, as that is the most natural way to
> implement the recursive behaviour. This patch should be able to
> piggyback on that.
Yes, though this commit [1] from 6 years ago started to assume that
grep_tree() only accepts root tree or commit, so the function fails
to process a tree like "sparsedir". It's the pathspec matching base that
was messed up. The support for a tree that is not at root-level was
added in this series.
[1] 74ed43711fd1cd7ce155d338f87ebe52cb74d9e2
>> @@ -537,8 +534,26 @@ static int grep_cache(struct grep_opt *opt,
>>
>> strbuf_setlen(&name, name_base_len);
>> strbuf_addstr(&name, ce->name);
>> + if (S_ISSPARSEDIR(ce->ce_mode)) {
>> + enum object_type type;
>> + struct tree_desc tree;
>> + void *data;
>> + unsigned long size;
>> + struct strbuf base = STRBUF_INIT;
>> +
>> + strbuf_addstr(&base, ce->name);
>> +
>> + data = read_object_file(&ce->oid, &type, &size);
>> + init_tree_desc(&tree, data, size);
>>
>> + /*
>> + * sneak in the ce_mode using check_attr parameter
>> + */
>> + hit |= grep_tree(opt, pathspec, &tree, &base,
>> + base.len, ce->ce_mode);
>
> OK. Instead of inventing a new "base" strbuf, we could reuse
> existing name while running the grep_tree() and restore it after it
> returns, and I suspect that the end result would be more in line
> with how grep_cache() uses that "name" buffer for all the cache
> entries. But that is not a correctness issue (it is move about
> preventing from making the code worse).
Oh right, thanks for the suggestion!
>> @@ -598,7 +613,14 @@ static int grep_tree(struct grep_opt *opt,
const struct pathspec *pathspec,
>> int te_len = tree_entry_len(&entry);
>>
>> if (match != all_entries_interesting) {
>> - strbuf_addstr(&name, base->buf + tn_len);
>> + if (S_ISSPARSEDIR(check_attr)) {
>> + // object is a sparse directory entry
>
> No // comments, please.
OK.
Thanks,
Shaoxuan