Re: Why is "git tag --contains" so slow?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 8, 2010 at 3:29 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> On Thu, 8 Jul 2010, Jakub Narebski wrote:
>> By the way, what had happened to the rev-cache project from GSoC 2009?
>
> I think the person who worked on it did so in good faith, and that the
> result is well done.
>
> But I personally cannot convince myself this is fundamentally the right
> solution to the issue.  Maintaining another data structure to work
> around defficiencies in the primary data structure doesn't sound right
> to me.
>
> I might be looking at this from my own perspective as one of the few
> people who hacked extensively on the Git pack format from the very
> beginning.  But I do see a way for the pack format to encode commit and
> tree objects so that walking them would be a simple lookup in the pack
> index file where both the SHA1 and offset in the pack for each parent
> can be immediately retrieved.  Same for tree references.  No deflating
> required, no binary search, just simple dereferences.  And the pack size
> would even shrink as a side effect.

One trick that bup uses is an additional file that sits alongside the
pack and acts as an index.  In bup's case, this is to work around
deficiencies in the .idx file format when using ridiculously huge
numbers of objects (hundreds of millions) across a large number of
packfiles.  But the same concept could apply here: instead of doing
something like rev-cache, you could just construct the "efficient"
part of the packv4 format (which I gather is entirely related to
commit messages), and store it alongside each pack.

This would allow people to incrementally modify git to use the new,
efficient commit object storage, without breaking backward
compatibility with earlier versions of git.  (Just as bup can index
huge numbers of packed objects but still stores them in the plain git
pack format.)

Just a thought.  Thinking of it this way might make it easier to get
over the inertia of an entirely new packfile format.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]