Re: git repository size / compression

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 09 Sep 2011 07:54:55 -0700 (PDT)

neubyr <neubyr@xxxxxxxxx> writes:
> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@xxxxxxxx> wrote:
> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:

>>> I have a test git repository with just two files in it. One of the
>>> file in it has a set of two lines that is repeated n times.
>>> e.g.:
>>> {{{
>>> $ for i in {1..5}; do cat ./lexico.txt>> lexico1.txt &&  cat
>>> ./lexico.txt>> lexico1.txt && mv ./lexico1.txt ./lexico.txt;  done
>>> }}}
>>>
>>
>> So you've just created some data that can be compressed quite
>> efficiently.
>>
>>> I ran above command few times and performed commit after each run. Now
>>> disk usage of this repository directory is mentioned below. The 419M
>>> is working directory size and 2.7M is git repository/database size.
>>>
>>> {{{
>>> $ du -h -d 1 .
>>> 2.7M    ./.git
>>> 419M    .
>>>
>>> }}}

Have you tried the same but with

   $ git gc --prune=now

before running `du`?

>>> Is it because of the compression performed by git before storing data
>>> (or before sending commit)??
>>
>> Yes. Git stores its objects (the commit, the snapshot of the files,
>> etc.) compressed. When these objects are stored in a pack, the size can
>> be further reduced by storing some objects as deltas which describe the
>> difference between itself and some other object in the object-db.
> 
> Does git store deltas for some files? I thought it uses snapshots
> (exact copy of staged files) only.

When creating packfile from loose objects (e.g. via `git gc`), it
does perform delta compression.

-- 
Jakub Narębski
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html