Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hrmm, I just realized that it dosn't actually cache paths/names...
This obviously has no bearing on its use in packing, but I should
either add that in or restrict usage in non-packing-related walks.
Weird how things like that escape you.

I think I may go ahead and add support for this tomorrow.  It should
have no effect on performance and very little impact on cache slice
size.

On Thu, Aug 6, 2009 at 10:01 PM, Nick Edelen<sirnot@xxxxxxxxx> wrote:
> Hi,
>
>>My idea with that was that you already have a SHA-1 map in the pack index,
>>and if all you want to be able to accelerate the revision walker, you'd
>>probably need something that adds yet another mapping, from commit to
>>parents and tree, and from tree to sub-tree and blob (so you can avoid
>>unpacking commit and tree objects).
>
> As I mention in one of the patch descriptions, along with each commit
> a list of objects introduced per commit is cached, so no extra I/O is
> necessary for tree recursion, etc. during traversal.
>
>>I just thought that it could be more efficient to do it at the time the
>>pack index is written _anyway_, as nothing will change in the pack after
>>that anyway.
>
> Nothing might change in the pack, but the slices were made to allow
> for continual addition and refinement of the cache.  In a typical
> usage slices will be added and fused on a regular basis, which would
> require tinkering in pack indexes if they were combined.
>
>>But I guess I can answer my question easily myself: the boundary commits
>>will not be handled that way.
>>
>>Still, there is some redundancy between the pack index and your cache, as
>>you have to write out the whole list of SHA-1s all over again.  I guess it
>>is time to look at the code instead of asking stupid questions.
>
> The whole revision cache is redundant, technically speaking: nothing
> in it can't be found by rummaging through packs or objects.  The point
> of it was to distill out important information for fast, easy access
> of the commit tree.
>
> On another note, I've eliminated the python dependancy.  Shall I
> resend the patchset now or should I wait until it has been further
> reviewed? (don't want to flood the list with resubmits)
>
>  - Nick
>
> On Thu, Aug 6, 2009 at 9:06 PM, Johannes
> Schindelin<Johannes.Schindelin@xxxxxx> wrote:
>> Hi,
>>
>> On Thu, 6 Aug 2009, Nick Edelen wrote:
>>
>>> > Sorry, I forgot the details, could you quickly remind me why these
>>> > caches are not in the pack index files?
>>>
>>> Er, I'm not sure what you mean.  Are you asking why these revision
>>> caches are required if we have a pack index, or why they aren't in the
>>> pack index, or something different?  I'm thinking probably the second:
>>
>> Yep.
>>
>>> the short answer is that cache slices are totally independant of pack
>>> files.
>>
>> My idea with that was that you already have a SHA-1 map in the pack index,
>> and if all you want to be able to accelerate the revision walker, you'd
>> probably need something that adds yet another mapping, from commit to
>> parents and tree, and from tree to sub-tree and blob (so you can avoid
>> unpacking commit and tree objects).
>>
>> I just thought that it could be more efficient to do it at the time the
>> pack index is written _anyway_, as nothing will change in the pack after
>> that anyway.
>>
>> But I guess I can answer my question easily myself: the boundary commits
>> will not be handled that way.
>>
>> Still, there is some redundancy between the pack index and your cache, as
>> you have to write out the whole list of SHA-1s all over again.  I guess it
>> is time to look at the code instead of asking stupid questions.
>>
>>> It might be possible to somehow merge revision cache slices with pack
>>> indexes, but I don't think it'd be a very suitable modification.  The
>>> rev-cache slices are meant to act almost like topo-relation pack files:
>>> new slices are simply new files, seperate slice files can be fused
>>> ("repacked") into a larger one, the index is a (recreatable) single file
>>> associating file (positions) with objects.  The format was geared to
>>> reducing potential cache/data loss and preventing overly large cache
>>> slices.
>>>
>>> >> Hmpf.
>>> >>
>>> >> We got rid of the last Python script in Git a long time ago, but now
>>> >> two different patch series try to sneak that dependency (at least for
>>> >> testing) back in.
>>> >>
>>> >> That's all the worse because we cannot use Python in msysGit, and
>>> >> Windows should be a platform benefitting dramatically from your work.
>>> >
>>> > In fact, the test the script performs could be easily rephrased with
>>> > "sort", "uniq" and "comm". OTOH: If the walker is supposed to return
>>> > the exact same orderd list of commits you can just use test_cmp.
>>>
>>> The language that script is written in isn't important.  I originally
>>> wrote it in python because I wanted something quick and wasn't much of
>>> a sh guru (sorry :-/ ).  As Micheal said I've no doubt it can easily
>>> be converted to shell script
>>
>> That is not what I wanted to hear.
>>
>>> -- in fact, I'll try to get a shell version working today.
>>
>> That is.
>>
>> Thanks,
>> Dscho
>>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]