Re: git status: small difference between stating whole repository and small subdirectory

Piotr Krukowiecki <piotr.krukowiecki@xxxxxxxxxxxxx> · Fri, 10 Feb 2012 14:46:09 +0100

On Fri, Feb 10, 2012 at 1:33 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> On Fri, Feb 10, 2012 at 4:42 PM, Piotr Krukowiecki
> <piotr.krukowiecki@xxxxxxxxx> wrote:
>> Hi,
>>
>> I compared stating whole tree vs one small subdirectory, and I
>> expected that for the subdirectory status will be very very fast.
>> After all, it has only few files to stat. But it's not fast. Why?
>
> Because stat'ing is not the only thing git-status does? In order to
> find out staged changes, unstaged changes and untracked files, it has
> to do the equivalence of "git diff --cached", "git diff" and "git
> ls-files -o". I think copy detection is also enabled, which uses more
> cycles.

I believe copy detection is not done, neither for tracked nor untracked files.
Rename detection is done for tracked files. In this case it should not
matter, as there were no changes to the files.

My point is, that for such small number of small files (55 files and
223KB), which had no changes at all, the status took a lot of time (17
seconds) and doing status on whole repository which has more than
2000x files and 10000x data took only 2x more time.

I don't think that the algorithm scales so well, so my guess is that
'status' is so inefficient for subdirectories (i.e. the "git status --
dir" filter does not filter as much as it could).

> Profiling it should give you a good idea what parts cost most.

I'll try that later.

BTW by stating I meant using "status", not that it uses stat() or whatever.

-- 
Piotr Krukowiecki
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html