Re: Curious about details of optimization of object database...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday 2009 January 09 11:46:23 chris@xxxxxxxxxxxx wrote:
>I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
>tree.

It's even more than that.  A commit object contains its message, the SHA of 
the tree, and zero or more SHAs for its parents.

>Can anyone say, in a few sentences, how git avoids needing to keep multiple
>slightly different copies of entire files without just storing lots of
>patches/diffs?

Loose objects can have large swaths of duplicated data.  However, git also 
supports storing objects in a packed format, which uses delta compression to 
reduce the duplication to close to nothing.

Some examples:
Sizes are from "du -sh .git ."; The .git directory stores all the objects as 
well as the repository configuration, refs, reflogs, etc.  The . directory 
has .git and a clean checkout of master.

The LinuxPMI (http://linuxpmi.org/) tree:
41M     .git
83M     .
(So, the storage is actually a bit smaller than the checkout; 984 objects; 140 
commits)

A small project between me an my flatmates:
309K    .git
3.6M    .
(Here, the storage is significantly smaller than the checkout; 786 objects; 
155 commits)

My repository that tracks my dotfiles:
124K    .git
176K    .
(113 objects; 28 commits)
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@xxxxxxxxxxxxxxxxx                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux