Multiblobs

Sergio Callegari <sergio.callegari@xxxxxxxxx> · Wed, 28 Apr 2010 15:12:07 +0000 (UTC)

Hi,

it happened to me to read an older post by Jeff King about "multiblobs"
(http://kerneltrap.org/mailarchive/git/2008/4/6/1360014) and I was wandering
whether the idea has been abandoned for some reason or just put on hold.

Apparently, this would marvellously help on
- storing large binary blobs (the split could happen with a rolling checksum
approach)
- storing "structured files", such as the many zip-based file formats
(Opendocument, Docx, Jar files, zip files themselves), tars (including
compressed tars), pdfs, etc, whose number is rising day after day...
- storing binary files with textual tags, where the tags could go on a separate
blob, greatly simplifying their readout without any need for caching them on a
note tree.
- etc...

Furthermore, this could also
- help the management of upstream trees. This could be simplified since the
"pristine tree" distributed as a tar.gz file and the exploded repo could share
their blobs making commands such as pristine-tree unnecessary.
- help projects such as bup that currently need to provide split mechanisms of
their own.
- be used to add "different representations" to objects... for instance, when
storing a pdf one could use a fake split to store in a separate blob the
corresponding text, making the git-diff of pdfs almost instantaneous.

>From Jeff's post, I guess that the major issue could be that the same file could
get a different sha1 as a multiblob versus a regular blob, but maybe it could be
possible to make the multiblob take the same sha1 of the "equivalent plain blob"
rather than its real hash.

For the moment, I am just very curious about the idea and the possible pros and
cons... can someone (maybe Jeff himself) tell me a little more? Also I wonder
about the two possibilities (implement it in git vs implement it "on top of"
git).

Sergio

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html