On Wed, 6 Sep 2006, A Large Angry SCM wrote: > Jon Smirl wrote: > > On 9/6/06, A Large Angry SCM <gitzilla@xxxxxxxxx> wrote: > >> TREE objects do not delta or deflate well. > > > > I can understand why they don't deflate, the path names are pretty > > much unique and the sha1s are incompressible. By why don't they delta > > well? Does sorting them by size mess up the delta process? > > My guess would be the TREEs would only delta well against other TREE > versions for the same path. That's what you'd normally have in a real project, though. I wonder if your "pack mashup" lost the normal behaviour: we very much sort trees together normally, thanks to the "sort-by-filename, then by size" behaviour that git-pack-objects should have (for trees, the size normally shouldn't change, so the sorting should basically boil down to "sort the same directory together, keeping the ordering it had from git-rev-list"). Btw, that "keeping the ordering it had" part I'm not convinced we actually enforce. That would depend on the sort algorithm used by "qsort()", I think. So there might be room for improvement there in order to keep things in recency order. > Just looking at the structures in non-BLOBS, I see a lot of potential > for the use of a set dictionaries when deflating TREEs and another set > of dictionaries when deflating COMMITs and TAGs. The low hanging fruit > is to create dictionaries of the most referenced IDs across all TREE or > COMMIT/TAG objects. Is there any way to get zlib to just generate a suggested dictionary from a given set of input? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html