Re: Tracking OpenOffice files/other compressed files with Git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Sixt <j.sixt <at> viscovery.net> writes:

> 
> Peter Krefting schrieb:
> > Since OpenOffice doucuments are just zipped xml files, I wondered how
> > difficult it would be to create some hooks/hack git to track the files
> > inside the archives instead?
> 
> You could write a "clean" filter that "recompresses" the archive with
> level 0 upon git-add.
> 


A couple of notes:

1) For Openoffice documents whose size is dominated by embed images and other
large objects, the git delta mechanism already performs reasonably well, since
OO files are Zip archives where each file is compressed separately.  If you do
not change an image, then that image remains stored in the same way and the
delta can be done.

2) For OO documents whose size is dominated by plain content, the git delta
mechanism cannot work, since the zip compression introduces "mixing" and a small
change in the document is converted into a very large change in the zip file.

It could be possible to write a clean filter to uncompress before commit.
However there is a trick with the complementary smudge filter to be used at
checkout. If you do not smudge properly, git always shows the file as changed
wrt the index.  Smudging correctly would mean using the very same compression
ratio and compress method that OO uses, which can be a little tricky. I have
tried using the zip binary both in the clean and the smudge phases and it does
not work nicely. The smudged file is always different from the original one. One
should probably work at a lower level to have a finer control on what is
happening (libzip) and prepend to the uncompressed file the compression
parameters to be restored on smudging.

The bigger issue is however that the clean/smudge thing can be really slow when
dealing with large OO files.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux