Re: git repository size / compression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/9/9 Jakub Narebski <jnareb@xxxxxxxxx>:
> neubyr <neubyr@xxxxxxxxx> writes:
>> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@xxxxxxxx> wrote:
>> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
>
>>>> I have a test git repository with just two files in it. One of the
>>>> file in it has a set of two lines that is repeated n times.
>>>> e.g.:
>>>> {{{
>>>> $ for i in {1..5}; do cat ./lexico.txt>> lexico1.txt &&  cat
>>>> ./lexico.txt>> lexico1.txt && mv ./lexico1.txt ./lexico.txt;  done
>>>> }}}
>>>>
>>>
>>> So you've just created some data that can be compressed quite
>>> efficiently.
>>>
>>>> I ran above command few times and performed commit after each run. Now
>>>> disk usage of this repository directory is mentioned below. The 419M
>>>> is working directory size and 2.7M is git repository/database size.
>>>>
>>>> {{{
>>>> $ du -h -d 1 .
>>>> 2.7M    ./.git
>>>> 419M    .
>>>>
>>>> }}}
>
> Have you tried the same but with
>
>   $ git gc --prune=now
>
> before running `du`?
>

Nope, I hadn't run git gc before. Here are du results after running
git gc command. That's about 55% less space now.. Great!

{{{
$ du -d 1 -h
924K    ./.git
417M    .
}}}


>>>> Is it because of the compression performed by git before storing data
>>>> (or before sending commit)??
>>>
>>> Yes. Git stores its objects (the commit, the snapshot of the files,
>>> etc.) compressed. When these objects are stored in a pack, the size can
>>> be further reduced by storing some objects as deltas which describe the
>>> difference between itself and some other object in the object-db.
>>
>> Does git store deltas for some files? I thought it uses snapshots
>> (exact copy of staged files) only.
>
> When creating packfile from loose objects (e.g. via `git gc`), it
> does perform delta compression.
>
> --
> Jakub Narębski
>

thank you everyone for explaining in detail..

--
neuby.r
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]