Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin venit, vidit, dixit 06.08.2009 16:48:
> Hi,
> 
> On Thu, 6 Aug 2009, Nick Edelen wrote:
> 
>> SUGGESTED FOR 'PU':
>>
>> Traversing objects is currently very costly, as every commit and tree must be 
>> loaded and parsed.  Much time and energy could be saved by caching metadata and 
>> topological info in an efficient, easily accessible manner.  Furthermore, this 
>> could improve git's interfacing potential, by providing a condensed summary of 
>> a repository's commit tree.
>>
>> This is a series to implement such a revision caching mechanism, aptly named 
>> rev-cache.  The series will provide:
>>  - a core API to manipulate and traverse caches
>>  - an integration into the internal revision walker
>>  - a porcelain front-end providing access to users and (shell) applications
>>  - a series of tests to verify/demonstrate correctness
>>  - documentation of the API, porcelain and core concepts
>>
>> In cold starts rev-cache has sped up packing and walking by a factor of 4, and 
>> over twice that on warm starts.  Some times on slax for the linux repository:
>>
>> rev-list --all --objects >/dev/null
>>  default
>>    cold    1:13
>>    warm    0:43
>>  rev-cache'd
>>    cold    0:19
>>    warm    0:02
>>
>> pack-objects --revs --all --stdout >/dev/null
>>  default
>>    cold    2:44
>>    warm    1:21
>>  rev-cache'd
>>    cold    0:44
>>    warm    0:10
> 
> Nice!
> 
>> The mechanism is minimally intrusive: most of the changes take place in 
>> seperate files, and only a handful of git's existing functions are 
>> modified.
> 
> Sorry, I forgot the details, could you quickly remind me why these caches 
> are not in the pack index files?
> 
>>  Documentation/rev-cache.txt           |   51 +
>>  Documentation/technical/rev-cache.txt |  336 ++++++
>>  Makefile                              |    2 +
>>  blob.c                                |    1 +
>>  blob.h                                |    1 +
>>  builtin-rev-cache.c                   |  284 +++++
>>  builtin.h                             |    1 +
>>  commit.c                              |    3 +
>>  commit.h                              |    2 +
>>  git.c                                 |    1 +
>>  list-objects.c                        |   49 +-
>>  rev-cache.c                           | 1832 +++++++++++++++++++++++++++++++++
>>  revision.c                            |   89 ++-
>>  revision.h                            |   46 +-
>>  t/t6015-rev-cache-list.sh             |  228 ++++
>>  t/t6015-sha1-dump-diff.py             |   36 +
> 
> Hmpf.
> 
> We got rid of the last Python script in Git a long time ago, but now two 
> different patch series try to sneak that dependency (at least for testing) 
> back in.
> 
> That's all the worse because we cannot use Python in msysGit, and Windows 
> should be a platform benefitting dramatically from your work.

In fact, the test the script performs could be easily rephrased with
"sort", "uniq" and "comm".
OTOH: If the walker is supposed to return the exact same orderd list of
commits you can just use test_cmp.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]