Re: Multiblobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2010/4/30 Hervé Cauwelier <herve@xxxxxxxxxx>:
> I'll obviously let the Git experts answer you, but I can answer about
> OpenDocument itself.
>
> In a presentation each slide is a <draw:page/> inside a single content.xml.
> So if you change one slide, the whole XML will serialize with a different
> SHA.
>
> And maybe you'll add style to that slide, or probably OpenOffice.org will
> generate an automatic style, so styles.xml will also change. Adding an image
> also changes manifest.xml, along with storing the image itself. OOo will
> surely record the last slide displayed when closing the application, so
> settings.xml will change too.
>
> So, all in all, for a single slide, 30 to 80 % of the Zip content may
> change.

Sure.  But if you name the chunks consistently, git's delta
compression can deal with tiny changes like those very easily.

The question is whether it'll work equally well, or better, or worse,
with a one-big-file format.  I think we won't know this without doing
some actual tests.

(Normally, you could assume that one-big-file is the most
space-efficient storage format, because then xdelta and gzip have the
most data to work with.  But if you have a lot of *duplicated* content
inside the same file, and the distance between duplications is outside
the gzip window, you could find that more unusual methods - like the
method used by bup - results in better compression.  I know this is
true for VM images, so it may be true for other things.  I haven't
tested everything :))

> You may also be interested in the git-bigfiles project that was mentioned
> last week.
>
> http://caca.zoy.org/wiki/git-bigfiles

git-bigfiles is a worthwhile project.  Its goal of "make life
bearable" is aiming kind of low, though.  Basically they seem to be
aiming simply to make git not die horribly when given lots of large
files.  This is commendable, but the resulting repo will be very space
inefficient when your large files change frequently in small ways.  So
I think it doesn't solve the problem Sergio brought up.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]