Hi Farshid, On 24 July 2017 at 13:45, Farshid Zavareh <fhzavareh@xxxxxxxxx> wrote: > I'll probably test this myself, but would modifying and committing a 4GB > text file actually add 4GB to the repository's size? I anticipate that it > won't, since Git keeps track of the changes only, instead of storing a copy > of the whole file (whereas this is not the case with binary files, hence the > need for LFS). I decided to do a little test myself. I add three versions of the same data set (sometimes slightly different cuts of the parent data set, which I don't have) each between 2 and 4GB in size. Each time I added a new version it added ~500MB to the repository, and operations on the repository took 35-45 seconds to complete. Running `git gc` compressed the objects fairly well, saving ~400MB of space. I would imagine that even more space would be saved (proportionally) if there were a lot more similar files in the repo. The time to checkout different commits didn't change much, I presume that most of the time is spent copying the large file into the working directory, but I didn't test that. I did test adding some other small files, and sometimes it was slow (when cold I think?) and other times fast. Overall, I think as long as the files change rarely, and the repository remains responsive, having these large files in the repository is ok. They're still big, and if most people will never use them it will be annoying for people to clone and checkout updated versions of the files. If you have a lot of the files, or they update often, or most people don't need all the files, using something like LFS will help a lot. $ git version # running on my windows machine at work git version 2.6.3.windows.1 $ git init git-csv-test && cd git-csv-test $ du -h --max-depth=2 # including here to compare after large data files are added 35K ./.git/hooks 1.0K ./.git/info 0 ./.git/objects 0 ./.git/refs 43K ./.git 43K . $ git add data.txt # first version of the data file, 3.2 GB $ git commit $ du -h --max-depth=2 # the data gets compressed down to ~580M of objects in the git store 35K ./.git/hooks 1.0K ./.git/info 2.0K ./.git/logs 580M ./.git/objects 1.0K ./.git/refs 581M ./.git 3.7G . $ git add data.txt # second version of the data file, 3.6 GB $ git commit $ du -h --max-depth=1 # an extra ~520M of objects added 1.2G ./.git 4.7G . $ time git add data.txt # 42.344s - second version of the data file, 2.2 GB $ git commit # takes about 30 seconds to load editor $ du -h --max-depth=1 1.7G ./.git 3.9G . $ time git checkout HEAD^ # 36.509s $ time git checkout HEAD^ # 44.658s $ time git checkout master # 38.267s $ git gc $ du -h --max-depth=1 1.3G ./.git 3.4G . $ time git checkout HEAD^ # 34.743s $ time git checkout HEAD^ # 41.226s Regards, Andrew Ardill