Re: A look at some alternative PACK file encodings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:
> 
> On Wed, 6 Sep 2006, A Large Angry SCM wrote:
> 
>> Jon Smirl wrote:
>>> On 9/6/06, A Large Angry SCM <gitzilla@xxxxxxxxx> wrote:
>>>> TREE objects do not delta or deflate well.
>>> I can understand why they don't deflate, the path names are pretty
>>> much unique and the sha1s are incompressible. By why don't they delta
>>> well? Does sorting them by size mess up the delta process?
>> My guess would be the TREEs would only delta well against other TREE
>> versions for the same path.
> 
> That's what you'd normally have in a real project, though. I wonder if 
> your "pack mashup" lost the normal behaviour: we very much sort trees 
> together normally, thanks to the "sort-by-filename, then by size" 
> behaviour that git-pack-objects should have (for trees, the size normally 
> shouldn't change, so the sorting should basically boil down to "sort the 
> same directory together, keeping the ordering it had from git-rev-list").

The mashup is just all the projects in a single repository with a bushy
refs tree so I can view the updates in a single gitk window.

The sorting by name, then by path may be breaking the object version
relationship for wide graphs.

> Btw, that "keeping the ordering it had" part I'm not convinced we actually 
> enforce. That would depend on the sort algorithm used by "qsort()", I 
> think. So there might be room for improvement there in order to keep 
> things in recency order.

qsort() is not stable.

>> Just looking at the structures in non-BLOBS, I see a lot of potential
>> for the use of a set dictionaries when deflating TREEs and another set
>> of dictionaries when deflating COMMITs and TAGs. The low hanging fruit
>> is to create dictionaries of the most referenced IDs across all TREE or
>> COMMIT/TAG objects.
>
> Is there any way to get zlib to just generate a suggested dictionary from 
> a given set of input?

The docs suggest "no".
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]