Dnia czwartek 8. lipca 2010 22:13, Nicolas Pitre napisał: > On Thu, 8 Jul 2010, Avery Pennarun wrote: > > On Thu, Jul 8, 2010 at 3:29 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > > I might be looking at this from my own perspective as one of the few > > > people who hacked extensively on the Git pack format from the very > > > beginning. But I do see a way for the pack format to encode commit and > > > tree objects so that walking them would be a simple lookup in the pack > > > index file where both the SHA1 and offset in the pack for each parent > > > can be immediately retrieved. Same for tree references. No deflating > > > required, no binary search, just simple dereferences. And the pack size > > > would even shrink as a side effect. > > > > One trick that bup uses is an additional file that sits alongside the > > pack and acts as an index. In bup's case, this is to work around > > deficiencies in the .idx file format when using ridiculously huge > > numbers of objects (hundreds of millions) across a large number of > > packfiles. But the same concept could apply here: instead of doing > > something like rev-cache, you could just construct the "efficient" > > part of the packv4 format (which I gather is entirely related to > > commit messages), and store it alongside each pack. > > No. I want the essential information in an efficient encoding _inside_ > the pack, actually replacing the existing encoding. One of the goal is > also to reduce repository size, not to grow it. That's a good idea. > > This would allow people to incrementally modify git to use the new, > > efficient commit object storage, without breaking backward > > compatibility with earlier versions of git. (Just as bup can index > > huge numbers of packed objects but still stores them in the plain git > > pack format.) > > Initially, what I'm aiming for is for pack-objects to produce the new > format, for index-pack to grok it, and for sha1_file:unpack_entry() to > simply regenerate the canonical object format whenever a pack v4 object > is encountered. Also pack-objects would be able to revert the object > encoding to the current format on the fly when it is serving a fetch > request to a client which is not pack v4 aware, just like we do now with > the ofs-delta capability. > > Once that stage is reached, I'll submit the lot and hope that other > people will help incrementally converting part of Git to benefit from > native access to the pack v4 data. The tree object walk code would be > the first obvious candidate. And so on. If I remember correctly with pack v4 some operations like getting size of tree object needs encoding to current format, so they are slower than they should be (and perhaps a bit slower than current implementation). But that should be I think rare (well, unless one streams to 'git cat-file --batch / --batch-check'). Would pack v4 need index v4? By the way, rev-cache project was started mainly to make "counting objects" part of clone / fetch faster. Would pack v4 offer the same without rev-cache? -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html