Re: [PATCH 0/3] On compresing large index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fwiw, specifically related to 'git ls-files', since it is a relatively
rare operation, it's probably ok if it's a bit slow.  I know you chose it
as a good benchmark of index reading performance.  I just mention it
because, in some hypothetical wild-and-crazy world in which we had a
git-aware file system layer, one could imagine doing away with most of the
index file and querying the file system for info on what's changed, SHA1
of subtrees, etc.

Do you have a sense of which operations on the index are high-value pain
points for large repositories?  I can imagine things like 'git-add' and
'git-commit', but I'm not super familiar with other common operations it
has a role in.

Josh


On 2/5/12 8:35 PM, "Nguyen Thai Ngoc Duy" <pclouds@xxxxxxxxx> wrote:

>2012/2/6 Thomas Rast <trast@xxxxxxxxxxx>:
>>> We need to figure out what git uses 4s user time for.
>>
>> When I worked on the cache-tree stuff, my observation (based on
>> profiling, so I had actual data :-) was that computing SHA1s absolutely
>> dominates everything in such operations.  It does that when writing the
>> index to write the trailing checksum, and also when loading it to verify
>> that the index is valid.
>
>You're right. This is on another machine but with same index (2M
>files), without SHA1 checksum:
>
>$ time ~/w/git/git ls-files --stage|head > /dev/null
>real    0m1.533s
>user    0m1.228s
>sys     0m0.306s
>
>and with SHA-1 checksum:
>
>$ time git ls-files --stage|head > /dev/null
>real     0m7.525s
>user    0m7.257s
>sys     0m0.268s
>
>I guess we could fall back to cheaper digests for such a large index.
>Still more than one second for doing nothing but reading index is too
>slow to me.
>
>> ls-files shouldn't be so slow though.  A quick run with callgrind in a
>> linux-2.6.git tells me it spends about 45% of its time on SHA1s and a
>> whopping 25% in quote_c_style().  I wonder what's so hard about
>> quoting...
>
>That's why I put "| head" there, to cut output processing overhead
>(hopefully).
>-- 
>Duy

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]