Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 16 Apr 2017 21:05:35 -0700

Johannes Sixt <j6t@xxxxxxxx> writes:

> Cc Gábor.
>
> Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita <carlosjosepita@xxxxxxxxx> wrote:
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>>     local dir="$(__gitdir)" root="${2-.}" file;
>>>     if [ -d "$dir" ]; then
>>>         __git_ls_files_helper "$root" "$1" | \
>>>             sed -r 's@/.*@@' | uniq | sort | uniq
>>>     fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real    0m0.075s
>>> user    0m0.083s
>>> sys    0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>
> This is about command line completion. We go a long way to avoid
> forking processes there. What is 10x faster on Linux despite of
> forking a process may not be so on Windows.

Doesn't this depend on how many paths there are?  If there are only
a few paths, the loop in shell would beat a pipe into sed even on
Linux, I suspect, and if there are tons of paths, at some number,
loop in shell would become slower than a single spawning of sed on
platforms with slower fork, no?