Re: [PATCH v1 2/2] builtin/grep.c: integrate with sparse index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/24/22 5:06 PM, Shaoxuan Yuan wrote:
> On 8/17/2022 10:23 PM, Derrick Stolee wrote:
>> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>>> Turn on sparse index and remove ensure_full_index().

>>> -    /* TODO: audit for interaction with sparse-index. */
>>> -    ensure_full_index(repo->index);
>>> +    if (grep_sparse)
> 
> A side note: this condition should be `grep_sparse && cached`.
> 
>>> +        ensure_full_index(repo->index);
>>> +
>> As mentioned before, this approach is the simplest way to make the case
>> without --sparse faster, but the case _with_ --sparse will still be slow.
>> The way to fix this would be to modify this portion of the loop:
> 
> I'm not sure. If --sparse here means we want to expand the index, it
> is expected to be slow (ensure_full_index is slow), isn't it?
> 
>>     if (S_ISREG(ce->ce_mode) &&
>>         match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
>>                S_ISDIR(ce->ce_mode) ||
>>                S_ISGITLINK(ce->ce_mode))) {
>>
>> by adding an initial case
>>
>>     if (S_ISSPARSEDIR(ce->ce_mode)) {
>>         hit |= grep_tree(opt, &ce->oid, name.buf, 0, name.buf);
>>     } else if (S_ISREG(ce->ce_mode) &&
>>            match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
>>                   S_ISDIR(ce->ce_mode) ||
>>                   S_ISGITLINK(ce->ce_mode))) {
>>
>> and appropriately implement "grep_tree()" to walk the tree at ce->oid to
>> find all matching files within, then call grep_oid() for each of those
>> paths.
> 
> Tree walking is faster, yes. So, for this approach to be faster, I
> think you are suggesting we should not expand the index, even when
> --sparse is given? Instead, we just rely on the tree walking logic,
> right?

Yes. Tree walking is a sizeable portion of the cost of expanding the
index, but we also avoid constructing the new index _and_ we can use
the t1092 tests to show that we are satisfying the behavior without
resorting to ensure_full_index(). It shows that we are doing the "most
correct" thing.

Walking trees also provides the way to speed up when focused on a
pathspec, since maybe the pathspec reduces the scope of the tree
search automatically (from existing tree-walking logic). Expanding
the index means "walk all the trees, then scan all the files" when
there might be better things to do instead.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux