Re: Management of opendocument (openoffice.org) files in git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:

48K     0.ods
48K     1.ods
60K     2.ods
60K     3.ods
56K     4.ods
64K     5.ods
68K     6.ods
64K     7.ods
64K     8.ods
68K     9.ods
600K    total

I then tracked this in three different ways, each in a fresh repo:

"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)

Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).

packed
708K    .git/objects
492K    .git/objects
692K    .git + wt

unpacked
1,3M    .git/objects
144K    .git/objects
1,5M    .git + wt

rezip
992K    .git/objects
148K    .git/objects
1,4M    .git + wt

oofilter
984K    .git/objects
148K    .git/objects
352K    .git + wt

Unsurprisingly, the total size is dominated by the work tree size if you
 have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the  HEAD revision than you
have in a HEAD checkout.

I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)

I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.

Michael

git version 1.6.0.2.426.g2cfa6

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux