Re: EXT :Re: GIT and large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Dienstag, den 20.05.2014, 17:24 +0000 schrieb Stewart, Louis (IS):
> Thanks for the reply.  I just read the intro to GIT and I am concerned
> about the part that it will copy the whole repository to the developers
> work area.  They really just need the one directory and files under
> that one directory. The history has TBs of data.
> 
> Lou
> 
> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@xxxxxxxxx] 
> Sent: Tuesday, May 20, 2014 1:18 PM
> To: Stewart, Louis (IS)
> Cc: git@xxxxxxxxxxxxxxx
> Subject: EXT :Re: GIT and large files
> 
> "Stewart, Louis (IS)" <louis.stewart@xxxxxxx> writes:
> 
> > Can GIT handle versioning of large 20+ GB files in a directory?
> 
> I think you can "git add" such files, push/fetch histories that
> contains such files over the wire, and "git checkout" such files, but
> naturally reading, processing and writing 20+GB would take some time. 
> In order to run operations that need to see the changes, e.g. "git log
> -p", a real content-level merge, etc., you would also need sufficient
> memory because we do things in-core.

Preventing that a clone fetches the whole history can be done with the
--depth option of git clone.

The question is what do you want to do with these 20G files?
Just store them in the repo and *very* occasionally change them?
For that you need a 64bit compiled version of git with enough ram. 32G
does the trick here. Everything with git 1.9.1.

Doing some tests on my machine with a normal harddisc gives (sorry for
LC_ALL != C):
$time git add file.dat; time git commit -m "add file"; time git status

real    16m17.913s
user    13m3.965s
sys     0m22.461s
[master 15fa953] add file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file.dat

real    15m36.666s
user    13m26.962s
sys     0m16.185s
# Auf Branch master
nichts zu committen, Arbeitsverzeichnis unverändert

real    11m58.936s
user    11m50.300s
sys     0m5.468s

$ls -lh
-rw-r--r-- 1 thomas thomas 20G Mai 20 19:01 file.dat

So this works but aint fast.

Playing some tricks with --assume-unchanged helps here:
$git update-index --assume-unchanged file.dat
$time git status
# Auf Branch master
nichts zu committen, Arbeitsverzeichnis unverändert

real    0m0.003s
user    0m0.000s
sys     0m0.000s

This trick is only save if you *know* that file.dat does not change.

And btw I also set 
$cat .gitattributes 
*.dat -delta
as delta compresssion should be skipped in any case.

Pushing and pulling these files to and from a server needs some tweaking
on the server side, otherwise the occasional git gc might kill the box.
 
Btw. I happily have files with 1.5GB in my git repositories and also
change them. And also work with git for windows. So in this region of
file sizes things work quite well.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]